Methods and materials for assessing and treating cancer

ABSTRACT

Provided herein are methods and materials for detecting and/or treating subject (e.g., a human) having cancer. In some embodiments, methods and materials for identifying a subject as having cancer (e.g., a localized cancer) are provided in which the presence of member(s) of two or more classes of biomarkers are detected. In some embodiments, methods and materials for identifying a subject as having cancer (e.g., a localized cancer) are provided in which the presence of member(s) of at least one class of biomarkers and the presence of aneuploidy are detected. In some embodiments, methods described herein provide increased sensitivity and/or specificity in the detection of cancer in a subject (e.g. a human).

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation-in-part of International ApplicationNo. PCT/US2018/045669, filed Aug. 7, 2018, which claims the benefit ofU.S. Patent Application Ser. No. 62/542,167, filed on Aug. 7, 2017, U.S.Patent Application Ser. No. 62/542,144, filed on Aug. 7, 2017, U.S.Patent Application Ser. No. 62/542,164, filed on Aug. 7, 2017, U.S.Patent Application Ser. No. 62/594,245, filed on Dec. 4, 2017, U.S.Patent Application Ser. No. 62/618,232, filed on Jan. 17, 2018, U.S.Patent Application Ser. No. 62/628,759, filed on Feb. 9, 2018, and U.S.Patent Application Ser. No. 62/629,870, filed on Feb. 13, 2018. Thedisclosures of the prior applications are considered part of (and areincorporated by reference in) the disclosure of this application.

STATEMENT REGARDING FEDERAL FUNDING

This invention was made with government support under CA062924 andHG007804 awarded by the National Institutes of Health. The governmenthas certain rights in the invention.

SEQUENCE LISTING

The instant application includes a Sequence Listing in electronic formatsubmitted to the United States Patent and Trademark Office via theelectronic filing system, and is hereby incorporated by reference in itsentirety. Said sequence listing, created on Aug. 6, 2018, is named448070306WO1SL.txt and is 208,305 bytes in size.

ELECTRONICALLY-FILED TABLES

The instant application includes tables in electronic format submittedto the United States Patent and Trademark Office via the electronicfiling system. The ASCII text files, each of which is incorporatedherein by reference in its entirety, include a text file namedTable1.txt, created on Aug. 7, 2018, having a size of 152,000 bytes; atext file named Table2.txt, created on Aug. 7, 2018, having a size of351,000 bytes; a text file named Table3.txt, created on Aug. 7, 2018,having a size of 438,000 bytes; a text file named Table4.txt, created onAug. 7, 2018, having a size of 1,081,000 bytes; a text file namedTable5.txt, created on Aug. 7, 2018, having a size of 31,000 bytes; atext file named Table6.txt, created on Aug. 7, 2018, having a size of103,000 bytes; a text file named Table7.txt, created on Aug. 7, 2018,having a size of 25,000 bytes; a text file named Table8.txt, created onAug. 7, 2018, having a size of 59,000 bytes; a text file namedTable9.txt, created on Aug. 7, 2018, having a size of 38,000 bytes; atext file named Table10.txt, created on Aug. 7, 2018, having a size of22,000 bytes; a text file named Table11.txt, created on Aug. 7, 2018,having a size of 17,000 bytes; a text file named Table12.txt, created onAug. 7, 2018, having a size of 14,000 bytes; a text file namedTable13.txt, created on Aug. 7, 2018, having a size of 104,000 bytes; atext file named Table14.txt, created on Aug. 7, 2018, having a size of106,000 bytes; a text file named Table15.txt, created on Aug. 7, 2018,having a size of 370,000 bytes; a text file named Table16.txt, createdon Aug. 7, 2018, having a size of 262,000 bytes; a text file namedTable17.txt, created on Aug. 7, 2018, having a size of 8,000 bytes; atext file named Table18.txt, created on Aug. 7, 2018, having a size of52,000 bytes; a text file named Table19.txt, created on Aug. 7, 2018,having a size of 41,000 bytes; a text file named Table20.txt, created onAug. 7, 2018, having a size of 14,000 bytes; a text file namedTable21.txt, created on Aug. 7, 2018, having a size of 6,000 bytes; atext file named Table22.txt, created on Aug. 7, 2018, having a size of19,000 bytes; a text file named Table23.txt, created on Aug. 7, 2018,having a size of 6,000 bytes; a text file named Table24.txt, created onAug. 7, 2018, having a size of 42,000 bytes; a text file namedTable25.txt, created on Aug. 7, 2018, having a size of 25,000 bytes; atext file named Table26.txt, created on Aug. 7, 2018, having a size of14,000 bytes; a text file named Table27.txt, created on Aug. 7, 2018,having a size of 5,000 bytes; a text file named Table28.txt, created onAug. 7, 2018, having a size of 10,000 bytes; a text file namedTable29.txt, created on Aug. 7, 2018, having a size of 9,000 bytes; atext file named Table30.txt, created on Aug. 7, 2018, having a size of3,000 bytes; a text file named Table31.txt, created on Aug. 7, 2018,having a size of 2,000 bytes; a text file named Table32.txt, created onAug. 7, 2018, having a size of 9,000 bytes; a text file namedTable33.txt, created on Aug. 7, 2018, having a size of 3,000 bytes; atext file named Table34.txt, created on Aug. 7, 2018, having a size of22,000 bytes; a text file named Table35.txt, created on Aug. 7, 2018,having a size of 1,536,000 bytes; a text file named Table36.txt, createdon Aug. 7, 2018, having a size of 1,591,000 bytes; a text file namedTable37.txt, created on Aug. 7, 2018, having a size of 13,000 bytes; atext file named Table38.txt, created on Aug. 7, 2018, having a size of5,000 bytes; a text file named Table39.txt, created on Aug. 7, 2018,having a size of 30,000 bytes; a text file named Table40.txt, created onAug. 7, 2018, having a size of 9,000 bytes; a text file namedTable41.txt, created on Aug. 7, 2018, having a size of 4,000 bytes; atext file named Table42.txt, created on Aug. 7, 2018, having a size of8,000 bytes; a text file named Table43.txt, created on Aug. 7, 2018,having a size of 25,000 bytes; a text file named Table44.txt, created onAug. 7, 2018, having a size of 11,000 bytes; a text file namedTable45.txt, created on Aug. 7, 2018, having a size of 11,000 bytes; atext file named Table46.txt, created on Aug. 7, 2018, having a size of18,000 bytes; a text file named Table47.txt, created on Aug. 7, 2018,having a size of 18,000 bytes; a text file named Table48.txt, created onAug. 7, 2018, having a size of 8,000 bytes; a text file namedTable49.txt, created on Aug. 7, 2018, having a size of 167,000 bytes; atext file named Table50.txt, created on Aug. 7, 2018, having a size of312,000 bytes; a text file named Table51.txt, created on Aug. 7, 2018,having a size of 20,000 bytes; a text file named Table52.txt, created onAug. 7, 2018, having a size of 1,000 bytes; a text file namedTable53.txt, created on Aug. 7, 2018, having a size of 3,000 bytes; atext file named Table54.txt, created on Aug. 7, 2018, having a size of3,000 bytes; a text file named Table55.txt, created on Aug. 7, 2018,having a size of 8,000 bytes; a text file named Table56.txt, created onAug. 7, 2018, having a size of 1,000 bytes; a text file namedTable57.txt, created on Aug. 7, 2018, having a size of 14,000 bytes; atext file named Table58.txt, created on Aug. 7, 2018, having a size of3,000 bytes; and a text file named Table59.txt, created on Aug. 7, 2018,having a size of 309,000 bytes.

LENGTHY TABLES The patent application contains a lengthy table section.A copy of the table is available in electronic form from the USPTO website(http://seqdata.uspto.gov/?pageRequest=docDetail&DocID=US20190256924A1).An electronic copy of the table will also be available from the USPTOupon request and payment of the fee set forth in 37 CFR 1.19(b)(3).

BACKGROUND 1. Technical Field

Provided herein are methods and materials for detecting and/or treatingsubject (e.g., humans) having cancer. In some embodiments, methods andmaterials for identifying a subject as having cancer (e.g., a localizedcancer) are provided in which the presence of two or more members of twoor more classes of biomarkers are detected. In some embodiments, methodsand materials for identifying a subject as having cancer (e.g., alocalized cancer) are provided in which the presence of two or moremembers of at least one class of biomarkers and the presence ofaneuploidy are detected. In some embodiments, methods described hereinprovide increased sensitivity and/or specificity of detecting cancer ina subject (e.g. a human).

2. Background Information

Cancers will kill 592,000 Americans this year and according to theCenter for Disease Control, cancers will soon be the leading cause ofdeath in this country. How can this dire situation be averted? The vastmajority of translational cancer research today is focused on prolongingsurvival in patients with advanced disease. Our research perspective isdifferent: in the long term, prevention is always better than cure.Examples of the value of this perspective are abundant, ranging frominfectious to cardiovascular diseases. Cardiovascular diseases areparticularly relevant because the combination of primary and secondaryprevention measures for this disease have reduced deaths by 75% in thelast 60 years. In contrast, overall cancer deaths have barely changedover the same time period.

Earlier detection through the application of blood tests for cancer canbe viewed as a form of secondary prevention. The last three letters ofthe word “earlier” are particularly important. For all cancers that havebeen studied, the probability for cure is much higher with early,localized disease than for advanced disease. The earlier the stage, themore likely the tumor can be cured by surgery alone. Moreover, cancersdo not have to be detected when they are at their initial stages to becured. Theoretically, the responses to therapy are dictated by the totalnumber of cancer cells prior to therapy and the rates of mutation inhuman cells. The more cancer cells, the more likely that at least one ofthem will contain or develop a mutation(s) that confers resistance toany form of therapy, be it conventional chemotherapy, radiotherapy,targeted therapy, or immunotherapy. Clinically, a large number ofstudies have shown that drugs can be curative in the adjuvant settingbut not in patients with advanced disease. For example, nearly half ofthe patients with Stage III colorectal cancer who would die from theirdisease can be cured by adjuvant therapy, but virtually no patients withStage IV colorectal cancers can be cured with the same regimens.

There is a strong correlation between tumor stage and prognosis in manycancers (Ansari D, et al. (2017) Relationship between tumour size andoutcome in pancreatic ductal adenocarcinoma. Br J Surg 104(5):600-607).Very few patients with cancers of the lung, colon, esophagus, or stomachwho have distant metastasis at the time of diagnosis survive for morethan five years (Howlader N, et al. (2016) SEER Cancer StatisticsReview, 1975-2013, National Cancer Institute. Bethesda, Md.,http://seer.cancer.gov/csr/1975_2013/, based on November 2015 SEER datasubmission, posted to the SEER web site, April 2016). The size ofcancers is also important in a general sense, in that smaller tumorshave less often metastasized than larger tumors at the time ofdiagnosis, and are therefore are more likely to be curable by surgeryalone. Even when cancers have metastasized to distant sites, a smallerburden of disease is much more easily managed than bulky lesions (BozicI, et al. (2013) Evolutionary dynamics of cancer in response to targetedcombination therapy. Elife 2:e00747). Thus, adjuvant chemotherapeuticagents administered to patients with micro-metastases stemming from acolorectal cancer can be curative in nearly 50% of cases (Semrad T J,Fahrni A R, Gong I Y, & Khatri V P (2015) Integrating Chemotherapy intothe Management of Oligometastatic Colorectal Cancer: Evidence-BasedApproach Using Clinical Trial Findings. Ann Surg Oncol 22 Suppl3:S855-862; Moertel C G, et al. (1995) Fluorouracil plus levamisole aseffective adjuvant therapy after resection of stage III colon carcinoma:a final report. Ann Intern Med 122(5):321-326; Andre T, et al. (2009)Improved overall survival with oxaliplatin, fluorouracil, and leucovorinas adjuvant treatment in stage II or III colon cancer in the MOSAICtrial. J Clin Oncol 27(19):3109-3116). The same chemotherapeutic agentsdelivered to patients with metastatic lesions that are radiologicallyvisible produce virtually no cures (Dy G K, et al. (2009) Long-termsurvivors of metastatic colorectal cancer treated with systemicchemotherapy alone: a North Central Cancer Treatment Group review of3811 patients, N0144. Clin Colorectal Cancer 8(2):88-93).

It is therefore evident that the earlier detection of cancers is one keyto reducing deaths from these diseases, including pancreatic cancer. Inaddition to offering the possibility of surgical resection, newlydeveloped adjuvant chemotherapeutic and emerging immunotherapy regimenswill undoubtedly prove more efficacious in patients with minimal diseasebeyond that which is curable surgically (Huang A C, et al. (2017) T-cellinvigoration to tumour burden ratio associated with anti-PD-1 response.Nature 545(7652):60-65). Biomarkers in the circulation provide one ofthe best ways, in principle, to detect cancers at an earlier stage.Historically, the type of biomarkers used to monitor cancers wereproteins (Liotta L A & Petricoin E F, 3rd (2003) The promise ofproteomics. Clin Adv Hematol Oncol 1(8):460-462), and includedcarcinoembryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9), andcancer antigen 125 (CA125). These biomarkers have proven useful forfollowing patients with known disease but none have been approved forscreening purposes, in part because of their low sensitivity orspecificity (Lennon A M & Goggins M (2010) Diagnostic and TherapeuticResponse Markers. Pancreatic Cancer, (Springer New York, N.Y., N.Y.), pp675-701; Clarke-Pearson D L (2009) Clinical practice. Screening forovarian cancer. N Engl J Med 361(2):170-177; Locker G Y, et al. (2006)ASCO 2006 update of recommendations for the use of tumor markers ingastrointestinal cancer. J Clin Oncol 24(33):5313-5327). More recently,mutant DNA has been explored as a biomarker. The concept underlying thisapproach, often called “liquid biopsies” is that cancer cells, likenormal self-renewing cells, turn over frequently. DNA released from thedying cells can escape into bodily fluids such as urine, stool, andplasma (Haber D A & Velculescu V E (2014) Blood-based analyses ofcancer: circulating tumor cells and circulating tumor DNA. Cancer Discov4(6):650-661; Dawson S J, et al. (2013) Analysis of circulating tumorDNA to monitor metastatic breast cancer. N Engl J Med 368(13):1199-1209;Bettegowda C, et al. (2014) Detection of circulating tumor DNA in early-and late-stage human malignancies. Science translational medicine6(224):224ra224; Kinde I, et al. (2013) Evaluation of DNA from thePapanicolaou test to detect ovarian and endometrial cancers. Sciencetranslational medicine 5(167):167ra164; Wang Y, et al. (2015) Detectionof somatic mutations and HPV in the saliva and plasma of patients withhead and neck squamous cell carcinomas. Science translational medicine7(293):293ra104; Wang Y, et al. (2015) Detection of tumor-derived DNA incerebrospinal fluid of patients with primary tumors of the brain andspinal cord. Proc Natl Acad Sci USA 112(31):9704-9709; Wang Y, et al.(2016) Diagnostic potential of tumor DNA from ovarian cyst fluid. Elife5; Springer S, et al. (2015) A Combination of Molecular Markers andClinical Features Improve the Classification of Pancreatic Cysts.Gastroenterology 149(6):1501-1510; Forshew T, et al. (2012) Noninvasiveidentification and monitoring of cancer mutations by targeted deepsequencing of plasma DNA. Science translational medicine4(136):136ra168; Vogelstein B & Kinzler K W (1999) Digital PCR. ProcNatl Acad Sci USA 96(16):9236-9241; Dressman D, Yan H, Traverso G,Kinzler K W, & Vogelstein B (2003) Transforming single DNA moleculesinto fluorescent magnetic particles for detection and enumeration ofgenetic variations. Proc Natl Acad Sci USA 100(15):8817-8822). Anadvantage of using mutant DNA in the circulation as a biomarker is itsexquisite specificity. Every cell within a cancer has a core set ofsomatic mutations in driver genes that are responsible for their clonalgrowth (Vogelstein B, et al. (2013) Cancer genome landscapes. Science339(6127):1546-1558). In contrast, normal cells do not clonally expandduring adulthood and the fraction of normal cells that have any specificsomatic mutation is extremely low.

Most studies of circulating tumor DNA (ctDNA) have focused on followingpatients with cancer rather than on evaluating their use in screeningsettings. Available data indicate that ctDNA is elevated in >85% ofpatients with advanced forms of many cancer types (Bettegowda C, et al.(2014) Detection of circulating tumor DNA in early- and late-stage humanmalignancies. Science translational medicine 6(224):224ra224; Wang Y, etal. (2015) Detection of somatic mutations and HPV in the saliva andplasma of patients with head and neck squamous cell carcinomas. Sciencetranslational medicine 7(293):293ra104). However, a considerably smallerfraction of patients with earlier stages of cancer have detectablelevels of ctDNA in their plasma (Bettegowda C, et al. (2014) Detectionof circulating tumor DNA in early- and late-stage human malignancies.Science translational medicine 6(224):224ra224; Wang Y, et al. (2015)Detection of somatic mutations and HPV in the saliva and plasma ofpatients with head and neck squamous cell carcinomas. Sciencetranslational medicine 7(293):293ra104).

The majority of localized cancers can be cured by surgery alone, withoutany systemic therapy (Siegel et al., 2017 CA Cancer J Clin 67:7-30).Once distant metastasis has occurred, however, surgical excision israrely curative. One major goal in cancer research is therefore thedetection of cancers before they metastasize to distant sites. Dependingon the cancer type, 20 to 30 years appear to be required for typicalcancers in adults to progress from incipient neoplastic lesions to latestage cancers (Vogelstein et al., 2013 Science 339:1546-1558; Jones etal., 2008 Proc Natl Acad Sci USA 105:4283-4288; and Yachida et al., 2012Clin Cancer Res 18:6339-6347). Only in the last few years of this longprocess do neoplastic cells appear to successfully seed and give rise tometastatic lesions (Vogelstein et al., 2013 Science 339:1546-1558; Joneset al., 2008 Proc Natl Acad Sci USA 105:4283-4288; Yachida et al., 2012Clin Cancer Res 18:6339-6347; and Vogelstein et al., 2015 N Engl J Med373:1895-1898). Thus, there is a wide window of opportunity to detectcancers prior to the onset of metastasis. Once large, metastatic tumorsare formed however, current therapies are not effective (Bozic et al.,2013 Elife 2:e00747; Semrad et al., 2015 Ann Surg Oncol 22(Suppl3):5855-862; Moertel et al., 1995 Ann Intern Med 122: 321-326; Huang etal., 2017 Nature 545:60-65).

Pancreatic ductal adenocarcinoma (hereinafter “pancreatic cancer”) isthe third leading cause of cancer death and is predicted to become thesecond most common cause in the United States by 2030 (Rahib L, et al.(2014) Projecting cancer incidence and deaths to 2030: the unexpectedburden of thyroid, liver, and pancreas cancers in the United States.Cancer Res 74(11):2913-2921). Pancreatic cancer is notoriously lethal,with fewer than 9% of patients surviving five years after diagnosis(Siegel R L, Miller K D, & Jemal A (2016) Cancer statistics, 2016. CACancer J Clin 66(1):7-30). The poor prognosis of patients withpancreatic cancer is in part due to the fact that 80% to 85% of patientsare diagnosed at advanced stages, when either tumor invasion into thesurrounding major vessels or distant metastases are evident uponradiologic studies (Ryan D P, Hong T S, & Bardeesy N (2014) Pancreaticadenocarcinoma. N Engl J Med 371(22):2140-2141). At this late point inthe disease, pancreatic cancer is not amenable to surgical resection,and the 3-year survival rate is <5%. In contrast, a five-year survivalof almost 60% is reported for very small, localized tumors; amongresectable cancers, the smaller the tumor, the better the prognosis(Ansari D, et al. (2017) Relationship between tumour size and outcome inpancreatic ductal adenocarcinoma. Br J Surg 104(5):600-607; Jung K W, etal. (2007) Clinicopathological aspects of 542 cases of pancreaticcancer: a special emphasis on small pancreatic cancer. J Korean Med Sci22 Suppl:S79-85; Egawa S, et al. (2004) Clinicopathological aspects ofsmall pancreatic cancer. Pancreas 28(3):235-240; Ishikawa 0, et al.(1999) Minute carcinoma of the pancreas measuring 1 cm or less indiameter—collective review of Japanese case reports.Hepatogastroenterology 46(25):8-15; Tsuchiya R, et al. (1986) Collectivereview of small carcinomas of the pancreas. Ann Surg 203(1):77-81).

Pancreatic cancer is not different from other cancers with respect toits strong correlation between tumor stage and prognosis (Ansari D, etal. (2017) Relationship between tumour size and outcome in pancreaticductal adenocarcinoma. Br J Surg 104(5):600-607). Very few patients withcancers of the lung, colon, esophagus, or stomach who have distantmetastasis at the time of diagnosis survive for more than five years(Howlader N, et al. (2016) SEER Cancer Statistics Review, 1975-2013,National Cancer Institute. Bethesda, Md.,http://seer.cancer.gov/csr/1975_2013/, based on November 2015 SEER datasubmission, posted to the SEER web site, April 2016). The size ofcancers is also important in a general sense, in that smaller tumorshave less often metastasized than larger tumors at the time ofdiagnosis, and are therefore are more likely to be curable by surgeryalone. Even when cancers have metastasized to distant sites, a smallerburden of disease is much more easily managed than bulky lesions (BozicI, et al. (2013) Evolutionary dynamics of cancer in response to targetedcombination therapy. Elife 2:e00747). Thus, adjuvant chemotherapeuticagents administered to patients with micro-metastases stemming from acolorectal cancer can be curative in nearly 50% of cases (Semrad T J,Fahrni A R, Gong I Y, & Khatri V P (2015) Integrating Chemotherapy intothe Management of Oligometastatic Colorectal Cancer: Evidence-BasedApproach Using Clinical Trial Findings. Ann Surg Oncol 22 Suppl3:S855-862; Moertel C G, et al. (1995) Fluorouracil plus levamisole aseffective adjuvant therapy after resection of stage III colon carcinoma:a final report. Ann Intern Med 122(5):321-326; Andre T, et al. (2009)Improved overall survival with oxaliplatin, fluorouracil, and leucovorinas adjuvant treatment in stage II or III colon cancer in the MOSAICtrial. J Clin Oncol 27(19):3109-3116). The same chemotherapeutic agentsdelivered to patients with metastatic lesions that are radiologicallyvisible produce virtually no cures (Dy G K, et al. (2009) Long-termsurvivors of metastatic colorectal cancer treated with systemicchemotherapy alone: a North Central Cancer Treatment Group review of3811 patients, N0144. Clin Colorectal Cancer 8(2):88-93).

It is therefore evident that the earlier detection of cancers is one keyto reducing deaths from these diseases, including pancreatic cancer. Inaddition to offering the possibility of surgical resection, newlydeveloped adjuvant chemotherapeutic and emerging immunotherapy regimenswill undoubtedly prove more efficacious in patients with minimal diseasebeyond that which is curable surgically (Huang A C, et al. (2017) T-cellinvigoration to tumour burden ratio associated with anti-PD-1 response.Nature 545(7652):60-65). Biomarkers in the circulation provide one ofthe best ways, in principle, to detect cancers at an earlier stage.Historically, the type of biomarkers used to monitor cancers wereproteins (Liotta L A & Petricoin E F, 3rd (2003) The promise ofproteomics. Clin Adv Hematol Oncol 1(8):460-462), and includedcarcinoembryonic antigen (CEA), carbohydrate antigen 19-9 (CA19-9), andcancer antigen 125 (CA125). These biomarkers have proven useful forfollowing patients with known disease but none have been approved forscreening purposes, in part because of their low sensitivity orspecificity (Lennon A M & Goggins M (2010) Diagnostic and TherapeuticResponse Markers. Pancreatic Cancer, (Springer New York, N.Y., N.Y.), pp675-701; Clarke-Pearson D L (2009) Clinical practice. Screening forovarian cancer. N Engl J Med 361(2):170-177; Locker G Y, et al. (2006)ASCO 2006 update of recommendations for the use of tumor markers ingastrointestinal cancer. J Clin Oncol 24(33):5313-5327). More recently,mutant DNA has been explored as a biomarker. The concept underlying thisapproach, often called “liquid biopsies” is that cancer cells, likenormal self-renewing cells, turn over frequently. DNA released from thedying cells can escape into bodily fluids such as urine, stool, andplasma (Haber D A & Velculescu V E (2014) Blood-based analyses ofcancer: circulating tumor cells and circulating tumor DNA. Cancer Discov4(6):650-661; Dawson S J, et al. (2013) Analysis of circulating tumorDNA to monitor metastatic breast cancer. N Engl J Med 368(13):1199-1209;Bettegowda C, et al. (2014) Detection of circulating tumor DNA in early-and late-stage human malignancies. Science translational medicine6(224):224ra224; Kinde I, et al. (2013) Evaluation of DNA from thePapanicolaou test to detect ovarian and endometrial cancers. Sciencetranslational medicine 5(167):167ra164; Wang Y, et al. (2015) Detectionof somatic mutations and HPV in the saliva and plasma of patients withhead and neck squamous cell carcinomas. Science translational medicine7(293):293ra104; Wang Y, et al. (2015) Detection of tumor-derived DNA incerebrospinal fluid of patients with primary tumors of the brain andspinal cord. Proc Natl Acad Sci USA 112(31):9704-9709; Wang Y, et al.(2016) Diagnostic potential of tumor DNA from ovarian cyst fluid. Elife5; Springer S, et al. (2015) A Combination of Molecular Markers andClinical Features Improve the Classification of Pancreatic Cysts.Gastroenterology 149(6):1501-1510; Forshew T, et al. (2012) Noninvasiveidentification and monitoring of cancer mutations by targeted deepsequencing of plasma DNA. Science translational medicine4(136):136ra168; Vogelstein B & Kinzler K W (1999) Digital PCR. ProcNatl Acad Sci USA 96(16):9236-9241; Dressman D, Yan H, Traverso G,Kinzler K W, & Vogelstein B (2003) Transforming single DNA moleculesinto fluorescent magnetic particles for detection and enumeration ofgenetic variations. Proc Natl Acad Sci USA 100(15):8817-882). Anadvantage of using mutant DNA in the circulation as a biomarker is itsexquisite specificity. Every cell within a cancer has a core set ofsomatic mutations in driver genes that are responsible for their clonalgrowth (Vogelstein B, et al. (2013) Cancer genome landscapes. Science339(6127):1546-1558). In contrast, normal cells do not clonally expandduring adulthood and the fraction of normal cells that have any specificsomatic mutation is extremely low.

Most studies of circulating tumor DNA (ctDNA) have focused on followingpatients with cancer rather than on evaluating their use in screeningsettings. Available data indicate that ctDNA is elevated in >85% ofpatients with advanced forms of many cancer types (Bettegowda C, et al.(2014) Detection of circulating tumor DNA in early- and late-stage humanmalignancies. Science translational medicine 6(224):224ra224; Wang Y, etal. (2015) Detection of somatic mutations and HPV in the saliva andplasma of patients with head and neck squamous cell carcinomas. Sciencetranslational medicine 7(293):293ra104). However, a considerably smallerfraction of patients with earlier stages of cancer have detectablelevels of ctDNA in their plasma (Bettegowda C, et al. (2014) Detectionof circulating tumor DNA in early- and late-stage human malignancies.Science translational medicine 6(224):224ra224; Wang Y, et al. (2015)Detection of somatic mutations and HPV in the saliva and plasma ofpatients with head and neck squamous cell carcinomas. Sciencetranslational medicine 7(293):293ra104).

There is a continuing need in the art to increase the sensitivity ofdetection of resectable or otherwise treatable cancers under conditionsthat preserve high specificity.

The Papanicolaou (Pap) test has dramatically decreased the incidence andmortality of cervical cancer in the screened population. Unfortunately,the Pap test is generally unable to detect endometrial or ovariancancers ((L. Geldenhuys, M. L. Murray, Sensitivity and specificity ofthe Pap smear for glandular lesions of the cervix and endometrium. Actacytologica 51, 47-50 (2007); A. B. Ng, J. W. Reagan, S. Hawliczek, B. W.Wentz, Significance of endometrial cells in the detection of endometrialcarcinoma and its precursors. Acta cytologica 18, 356-361 (1974); P. F.Schnatz, M. Guile, D. M. O'Sullivan, J. I. Sorosky, Clinicalsignificance of atypical glandular cells on cervical cytology.Obstetrics and gynecology 107, 701-708 (2006); C. Zhao, A. Florea, A.Onisko, R. M. Austin, Histologic follow-up results in 662 patients withPap test findings of atypical glandular cells: results from a largeacademic womens hospital laboratory employing sensitive screeningmethods. Gynecologic oncology 114, 383-389 (2009)). In light of thesuccess of the Pap test in detecting early-stage, curable cervicalcancers, ovarian and endometrial cancers are currently the most lethaland most common gynecologic malignancies, respectively, in countrieswhere Pap tests are routinely performed (N. Howlader et al., SEER CancerStatistics Review, 1975-2014, National Cancer Institute. (2017)).Together, endometrial and ovarian cancers account for approximately25,000 deaths each year and are the third leading cause ofcancer-related mortality in women in the United States (N. Howlader etal., SEER Cancer Statistics Review, 1975-2014, National CancerInstitute. (2017)). Most of these deaths are caused by high-grade tumorsubtypes, which tend to metastasize prior to the onset of symptoms (R.J. Kurman, M. Shih Ie, The origin and pathogenesis of epithelial ovariancancer: a proposed unifying theory. The American journal of surgicalpathology 34, 433-443 (2010); K. N. Moore, A. N. Fader, Uterinepapillary serous carcinoma. Clin Obstet Gynecol 54, 278-291 (2011)).

Endometrial cancer is the most common gynecologic malignancy, with61,380 estimated new cases in 2017 in the United States (N. Howlader etal., SEER Cancer Statistics Review, 1975-2014, National CancerInstitute. (2017)). The incidence of endometrial cancer has been risingwith increased obesity and increased life expectancy (M. Arnold et al.,Global burden of cancer attributable to high body-mass index in 2012: apopulation-based study. The Lancet. Oncology 16, 36-46 (2015)). At thesame time, relative survival has not improved over the past decades (N.Howlader et al., SEER Cancer Statistics Review, 1975-2014, NationalCancer Institute. (2017); L. Rahib et al., Projecting cancer incidenceand deaths to 2030: the unexpected burden of thyroid, liver, andpancreas cancers in the United States. Cancer research 74, 2913-2921(2014)). Much effort has been directed towards developing a screeningtest for this cancer type. The most common diagnostic test istransvaginal ultrasound (TVUS), which measures the thickness of theendometrium. The potential of TVUS as a screening test is undermined byits inability to reliably distinguish between benign and malignantlesions, subjecting women without cancer to unnecessary invasiveprocedures and their associated complications. Its high false positiverate is demonstrated by the fact that as few as one in 50 women whotested positive by TVUS was proven to have endometrial cancer afterundergoing additional diagnostic procedures (Jacobs et al., Sensitivityof transvaginal ultrasound screening for endometrial cancer inpostmenopausal women: a case-control study within the UKCTOCS cohort.The Lancet. Oncology 12, 38-48 (2011)).

Ovarian cancer is the second most common gynecologic malignancy in theU.S. and Europe. It is often diagnosed at a late stage, when the 5-yearsurvival is less than 30% (N. Howlader et al., SEER Cancer StatisticsReview, 1975-2014, National Cancer Institute. (2017)). The highmortality has made the development of an effective screening test a highpriority. Large randomized trials have assessed the use of CA-125 andTVUS as potential screening tests for ovarian cancer (Buys et al.,Effect of screening on ovarian cancer mortality: the Prostate, Lung,Colorectal and Ovarian (PLCO) Cancer Screening Randomized ControlledTrial. JAMA 305, 2295-2303 (2011); Kobayashi et al., A randomized studyof screening for ovarian cancer: a multicenter study in Japan. Int JGynecol Cancer 18, 414-420 (2008); Jacobs et al., Ovarian cancerscreening and mortality in the UK Collaborative Trial of Ovarian CancerScreening (UKCTOCS): a randomised controlled trial. Lancet 387, 945-956(2016); Menon et al., Risk Algorithm Using Serial Biomarker MeasurementsDoubles the Number of Screen-Detected Cancers Compared With aSingle-Threshold Rule in the United Kingdom Collaborative Trial ofOvarian Cancer Screening. J Clin Oncol 33, 2062-2071 (2015)). However,screening with current diagnostic approaches is not recommended for thegeneral population, as it leads to “important harms, including majorsurgical interventions in women who do not have cancer” (V. A. Moyer, U.S. P. S. T. Force, Screening for ovarian cancer: U.S. PreventiveServices Task Force reaffirmation recommendation statement. Annals ofinternal medicine 157, 900-904 (2012)). Thus, new diagnostic approachesare urgently needed.

Among ovarian cancers, high-grade serous carcinomas (HGSC) account for90% of all ovarian cancer deaths. Increasing evidence suggests that mostHGSC arise in the fallopian tube and subsequently implant on the ovariansurface (16-21R. J. Kurman, M. Shih Ie, Molecular pathogenesis andextraovarian origin of epithelial ovarian cancer—shifting the paradigm.Human pathology 42, 918-931 (2011); Lee et al., A candidate precursor toserous carcinoma that originates in the distal fallopian tube. TheJournal of pathology 211, 26-35 (2007) A candidate precursor to serouscarcinoma that originates in the distal fallopian tube. The Journal ofpathology 211, 26-35 (2007); Eckert et al., Genomics of Ovarian CancerProgression Reveals Diverse Metastatic Trajectories IncludingIntraepithelial Metastasis to the Fallopian Tube. Cancer Discov 6,1342-1351 (2016); A. M. Karst, K. Levanon, R. Drapkin, Modelinghigh-grade serous ovarian carcinogenesis from the fallopian tube. ProcNatl Acad Sci USA 108, 7547-7552 (2011); Zhai et al., High-grade serouscarcinomas arise in the mouse oviduct via defects linked to the humandisease. The Journal of pathology 243, 16-25 (2017); R. J. Kurman, M.Shih Ie, The Dualistic Model of Ovarian Carcinogenesis: Revisited,Revised, and Expanded. Am J Pathol 186, 733-747 (2016)). A recentprospective study of symptomatic women reported that most earlydiagnosed HGSCs have extra-ovarian origins (Gilbert et al. Assessment ofsymptomatic women for early diagnosis of ovarian cancer: results fromthe prospective DOvE pilot project. The Lancet. Oncology 13, 285-291(2012)). This might explain the low sensitivity of TVUS for earlydisease, when no ovarian abnormalities are detectable. Multimodalscreening with serum CA-125 levels improves sensitivity, however CA-125lacks specificity and is elevated in a variety of common benignconditions (H. Meden, A. Fattahi-Meibodi, CA 125 in benign gynecologicalconditions. Int J Biol Markers 13, 231-237 (1998)).

Unlike markers associated with neoplasia, cancer driver gene mutationsare causative agents of neoplasia and absent in non-neoplasticconditions. It has been shown that tumor DNA could be detected in thevaginal tract of women with ovarian cancer (Erickson et al., Detectionof somatic TP53 mutations in tampons of patients with high-grade serousovarian cancer. Obstetrics and gynecology 124, 881-885 (2014)).Furthermore, a recent proof-of-principle study showed that endometrialand ovarian cancers shed cells that collect at the cervix, allowingdetectable levels of tumor DNA to be found in the fluids obtained duringroutine Pap tests (Kinde et al., Evaluation of DNA from the Papanicolaoutest to detect ovarian and endometrial cancers. Sci Transl Med 5,167ra164 (2013)). These cells are sampled with a brush (a “Pap brush”)that is inserted into the endocervical canal. The brush is then dippedinto preservative fluid. For the detection of cervical cancers, cellsfrom the fluid are applied to a slide for cytologic examination (theclassic Pap smear). Additionally, DNA is often purified from the fluidto search for HPV sequences.

Bladder cancer (BC) is the most common malignancy of the urinary tract.According to the American Cancer Society, 79,030 new cases of bladdercancer and 18,540 deaths are estimated to occur in the United Statesalone in 2017 [Siegel R L, Miller K D, Jemal A (2017) Cancer Statistics,2017. CA Cancer J Clin 67:7-30]. Predominantly of urothelial histology,invasive BC arises from non-invasive papillary or flat precursors. ManyBC patients suffer with multiple relapses prior to progression,providing ample lead-time for early detection and treatment prior tometastasis [Netto G J (2013) Clinical applications of recent molecularadvances in urologic malignancies: no longer chasing a “mirage”?. AdvAnat Pathol 20:175-203]. Urine cytology and cystoscopy withtransurethral biopsy (TURB) are currently the gold standard fordiagnosis and follow-up in bladder cancer. While urine cytology hasvalue for the detection of high-grade neoplasms, it is unable to detectthe vast majority of low-grade tumors [Netto G J, Tafe L J (2016)Emerging Bladder Cancer Biomarkers and Targets of Therapy. Urol ClinNorth Am 43:63-76; Lotan Y, Roehrborn C G (2003) Sensitivity andspecificity of commonly available bladder tumor markers versus cytology:results of a comprehensive literature review and meta-analyses. Urology61:109-18; discussion 118; Zhang M L, Rosenthal D L, VandenBussche CJ(2016) The cytomorphological features of low-grade urothelial neoplasmsvary by specimen type. Cancer Cytopathol 124:552-564]. This fact,together with the high cost and invasive nature of repeated cystoscopyand TURB procedures, have led to many attempts to develop novelnoninvasive strategies. These include urine or serum based genetic andprotein assays for screening and surveillance [Kawauchi et al., (2009)9p21 Index as Estimated by Dual-Color Fluorescence in Situ Hybridizationis Useful to Predict Urothelial Carcinoma Recurrence in Bladder WashingCytology. Hum Pathol 40:1783-1789; Kruger S, Mess F, Bohle A, Feller A C(2003) Numerical aberrations of chromosome 17 and the 9p21 locus areindependent predictors of tumor recurrence in non-invasive transitionalcell carcinoma of the urinary bladder. Int J Oncol 23:41-48; Skacel etal., (2003) Multitarget fluorescence in situ hybridization assay detectstransitional cell carcinoma in the majority of patients with bladdercancer and atypical or negative urine cytology. J Urol 169:2101-2105;Sarosdy et al., (2006) Use of a multitarget fluorescence in situhybridization assay to diagnose bladder cancer in patients withhematuria. J Urol 176:44-47; Moonen et al., (2007) UroVysion comparedwith cytology and quantitative cytology in the surveillance ofnon-muscle-invasive bladder cancer. Eur Urol 51:1275-80; discussion1280; Fradet Y, Lockhard C (1997) Performance characteristics of a newmonoclonal antibody test for bladder cancer: ImmunoCyt trade mark. Can JUrol 4:400-405; Yafi et al., (2015) Prospective analysis of sensitivityand specificity of urinary cytology and other urinary biomarkers forbladder cancer. Urol Oncol 33:66.e25-66.e31; Serizawa et al., (2010)Integrated genetic and epigenetic analysis of bladder cancer reveals anadditive diagnostic value of FGFR3 mutations and hypermethylationevents. Int J Cancer; Kinde et al., (2013) TERT promoter mutations occurearly in urothelial neoplasia and are biomarkers of early disease anddisease recurrence in urine. Cancer Res 73:7162-7167; Hurst C D, Platt FM, Knowles M A (2014) Comprehensive mutation analysis of the TERTpromoter in bladder cancer and detection of mutations in voided urine.Eur Urol 65:367-369; Wang et al., (2014) TERT promoter mutations areassociated with distant metastases in upper tract urothelial carcinomasand serve as urinary biomarkers detected by a sensitive castPCR.Oncotarget 5:12428-12439; Ralla et al., (2014) Nucleic acid-basedbiomarkers in body fluids of patients with urologic malignancies. CritRev Clin Lab Sci 51:200-231; Ellinger J, Muller S C, Dietrich D (2015)Epigenetic biomarkers in the blood of patients with urologicalmalignancies. Expert Rev Mol Diagn 15:505-516; Bansal N, Gupta A,Sankhwar S N, Mandi A A (2014) Low- and high-grade bladder cancerappraisal via serum-based proteomics approach. Clin Chim Acta436:97-103; Goodison S, Chang M, Dai Y, Urquidi V, Rosser C J (2012) Amulti-analyte assay for the non-invasive detection of bladder cancer.PLoS One 7:e47469; Allory et al., (2014) Telomerase reversetranscriptase promoter mutations in bladder cancer: high frequencyacross stages, detection in urine, and lack of association with outcome.Eur Urol 65:360-366]. Currently available U.S. Food and DrugAdministration (FDA) approved assays include ImmunoCyt test (ScimedxCorp), nuclear matrix protein 22 (NMP22) immunoassay test (Matritech),and multitarget FISH (UroVysion) [Kawauchi et al., (2009) 9p21 Index asEstimated by Dual-Color Fluorescence in Situ Hybridization is Useful toPredict Urothelial Carcinoma Recurrence in Bladder Washing Cytology. HumPathol 40:1783-1789; Kruger S, Mess F, Bohle A, Feller A C (2003)Numerical aberrations of chromosome 17 and the 9p21 locus areindependent predictors of tumor recurrence in non-invasive transitionalcell carcinoma of the urinary bladder. Int J Oncol 23:41-48; Skacel etal., (2003) Multitarget fluorescence in situ hybridization assay detectstransitional cell carcinoma in the majority of patients with bladdercancer and atypical or negative urine cytology. J Urol 169:2101-2105;Sarosdy et al., (2006) Use of a multitarget fluorescence in situhybridization assay to diagnose bladder cancer in patients withhematuria. J Urol 176:44-47; Moonen et al., (2007) UroVysion comparedwith cytology and quantitative cytology in the surveillance ofnon-muscle-invasive bladder cancer. Eur Urol 51:1275-80; discussion1280; Fradet Y, Lockhard C (1997) Performance characteristics of a newmonoclonal antibody test for bladder cancer: ImmunoCyt trade mark. Can JUrol 4:400-405; Yafi et al., (2015) Prospective analysis of sensitivityand specificity of urinary cytology and other urinary biomarkers forbladder cancer. Urol Oncol 33:66.e25-66.e31]. Sensitivities between 62%and 69% and specificities between 79% and 89% have been reported forsome of these tests. However, due to assay performance inconsistencies,cost or required technical expertise, integration of such assays intoroutine clinical practice has not yet occurred.

Bladder cancer typically falls into three types that begin in cells inthe lining of the bladder. In some embodiments, bladder cancers arenamed for the type of cells that become malignant (cancerous) includingtransitional cell carcinoma, squamous cell carcinoma, andadenocarcinoma. Transitional cell carcinomas begin in cells in theinnermost tissue layer of the bladder. Transitional cell carcinomas canbe low-grade or high-grade. Low-grade transitional cell carcinomas canrecur after treatment, but rarely spread into the muscle layer of thebladder or to other parts of the body. High-grade transitional cellcarcinomas can recur after treatment and often spreads into the musclelayer of the bladder, to other parts of the body, and to lymph nodes.Almost all deaths from bladder cancer are due to high-grade disease.Squamous cell carcinomas begin in squamous cells, which are thin, flatcells that may form in the bladder after long-term infection orirritation. Adenocarcinomas begin in glandular (secretory) cells thatare found in the lining of the bladder, and are a very rare type ofbladder cancer.

High rates of activating mutations in the upstream promoter of the TERTgene are found in the majority of BC as well as in other cancer types[Huang F W, Hodis E, Xu M J, Kryukov G V, Chin L, Garraway L A (2013)Highly recurrent TERT promoter mutations in human melanoma. Science339:957-959; Killela et al., (2013) TERT promoter mutations occurfrequently in gliomas and a subset of tumors derived from cells with lowrates of self-renewal. Proc Natl Acad Sci USA 110:6021-6026; Scott G A,Laughlin T S, Rothberg P G (2014) Mutations of the TERT promoter arecommon in basal cell carcinoma and squamous cell carcinoma. Mod Pathol27:516-523]. TERT promoter mutations predominantly affect two hot spots,g.1295228 C>T and g.1295250 C>T. They lead to the generation of CCGGAA/Tor GGAA/T motifs altering binding site for ETS transcription factors andsubsequently increased TERT promoter activity [Huang F W, Hodis E, Xu MJ, Kryukov G V, Chin L, Garraway L A (2013) Highly recurrent TERTpromoter mutations in human melanoma. Science 339:957-959; Horn et al.,(2013) TERT promoter mutations in familial and sporadic melanoma.Science 339:959-961]. TERT promoter mutations occur in up to 80% ofinvasive urothelial carcinomas of the bladder and upper urinary tract aswell as in several of its histologic variants [Kinde et al., (2013) TERTpromoter mutations occur early in urothelial neoplasia and arebiomarkers of early disease and disease recurrence in urine. Cancer Res73:7162-7167; Killela et al., (2013) TERT promoter mutations occurfrequently in gliomas and a subset of tumors derived from cells with lowrates of self-renewal. Proc Natl Acad Sci USA 110:6021-6026; Allory etal., (2014) Telomerase reverse transcriptase promoter mutations inbladder cancer: high frequency across stages, detection in urine, andlack of association with outcome. Eur Urol 65:360-366; Cowan et al.,(2016) Detection of TERT promoter mutations in primary adenocarcinoma ofthe urinary bladder. Hum Pathol 53:8-13; Nguyen et al., (2016) Highprevalence of TERT promoter mutations in micropapillary urothelialcarcinoma. Virchows Arch 469:427-434]. Moreover, TERT promoter mutationsoccur in 60-80% of BC precursors, including Papillary UrothelialNeoplasms of Low Malignant Potential [Rodriguez et al., (2017) Spectrumof genetic mutations in de novo PUNLMP of the urinary bladder. VirchowsArch], non-invasive Low Grade Papillary Urothelial Carcinoma,non-invasive High Grade Papillary Urothelial Carcinoma and “flat”Carcinoma in Situ (CIS), as well as in urinary cells from a subset ofthese patients [Kinde et al., (2013) TERT promoter mutations occur earlyin urothelial neoplasia and are biomarkers of early disease and diseaserecurrence in urine. Cancer Res 73:7162-7167]. TERT promoter mutationshave thus been established as the most common genetic alteration in BC[Kinde et al., (2013) TERT promoter mutations occur early in urothelialneoplasia and are biomarkers of early disease and disease recurrence inurine. Cancer Res 73:7162-7167; Cheng L, Montironi R, Lopez-Beltran A(2017) TERT Promoter Mutations Occur Frequently in Urothelial Papillomaand Papillary Urothelial Neoplasm of Low Malignant Potential. Eur Urol71:497-498]. Other oncogene-activating mutations include those in FGFR3,RAS and PIK3CA, which have been shown to occur in a high fraction ofnon-muscle invasive bladder cancers [International Agency for Researchon Cancer. (2016) WHO Classification of Tumours of the Urinary Systemand Male Genital Organs. World Health Organization; 4 edition; Netto G J(2011) Molecular biomarkers in urothelial carcinoma of the bladder: arewe there yet?. Nat Rev Urol 9:41-51]. In muscle-invasive bladdercancers, mutations in TP53, CDKN2A, MLL and ERBB2 are also frequentlyfound [Netto G J (2011) Molecular biomarkers in urothelial carcinoma ofthe bladder: are we there yet?. Nat Rev Urol 9:41-51; Mo et al., (2007)Hyperactivation of Ha-ras oncogene, but not Ink4a/Arf deficiency,triggers bladder tumorigenesis. J Clin Invest 117:314-325; Sarkis etal., (1993) Nuclear overexpression of p53 protein in transitional cellbladder carcinoma: a marker for disease progression. J Natl Cancer Inst85:53-59; Lin et al., (2010) Increase sensitivity in detectingsuperficial, low grade bladder cancer by combination analysis ofhypermethylation of E-cadherin, p16, p14, RASSF1A genes in urine. UrolOncol 28:597-602; Sarkis et al., (1994) Association of P53 nuclearoverexpression and tumor progression in carcinoma in situ of thebladder. J Urol 152:388-392; Wu X R (2005) Urothelial tumorigenesis: atale of divergent pathways. Nat Rev Cancer 5:713-725; Cancer GenomeAtlas Research Network (2014) Comprehensive molecular characterizationof urothelial bladder carcinoma. Nature 507:315-322].

Because urine cytology is relatively insensitive for the detection ofrecurrence, cystoscopies are performed as often as every three months insuch patients in the U.S. In fact, the cost of managing these patientsis in aggregate higher than the cost of managing any other type ofcancer, and amounts to 3 billion dollars annually [Netto G J, Epstein JI (2010) Theranostic and prognostic biomarkers: genomic applications inurological malignancies. Pathology 42:384-394]. A non-invasive test thatcould predict which of these patients were most likely to developrecurrent BC could thereby be both medically and economically important.

More than 400,000 new cases of urologic transitional cell carcinoma arediagnosed worldwide each year (Antoni, S., Ferlay, J., Soerjomataram,I., Znaor, A., Jemal, A., & Bray, F. (2017). Bladder Cancer Incidenceand Mortality: A Global Overview and Recent Trends. Eur Urol, 71(1),96-108. doi: 10.1016/j.eururo.2016.06.010). Although most of theseurothelial carcinomas arise in the bladder in the lower urinary tract,5-10% originate in the upper urinary tract in the renal pelvis and/orureter (Roupret, M., Babjuk, M., Comperat, E., Zigeuner, R., Sylvester,R. J., Burger, M., Cowan, N. C., Bohle, A., Van Rhijn, B. W., Kaasinen,E., Palou, J., & Shariat, S. F. (2015). European Association of UrologyGuidelines on Upper Urinary Tract Urothelial Cell Carcinoma: 2015Update. Eur Urol, 68(5), 868-879. doi: 10.1016/j.eururo.2015.06.044;Soria, F., Shariat, S. F., Lerner, S. P., Fritsche, H. M., Rink, M.,Kassouf, W., Spiess, P. E., Lotan, Y., Ye, D., Fernandez, M. I.,Kikuchi, E., Chade, D. C., Babjuk, M., Grollman, A. P., & Thalmann, G.N. (2017). Epidemiology, diagnosis, preoperative evaluation andprognostic assessment of upper-tract urothelial carcinoma (UTUC). WorldJ Urol, 35(3), 379-387. doi: 10.1007/s00345-016-1928-x). The annualincidence of these upper tract urothelial carcinomas (UTUCs) in Westerncountries is 1-2 cases per 100,000, but occurs at a much higher rate inpopulations exposed to aristolochic acid (AA) (Chen, C. H., Dickman, K.G., Moriya, M., Zavadil, J., Sidorenko, V. S., Edwards, K. L., Gnatenko,D. V, Wu, L., Turesky, R. J., Wu, X. R., Pu, Y S., & Grollman, A. P.(2012). Aristolochic acid-associated urothelial cancer in Taiwan. ProcNatl Acad Sci USA, 109(21), 8241-8246. doi: 10.1073/pnas.1119920109;Grollman, A. P. (2013). Aristolochic acid nephropathy: Harbinger of aglobal iatrogenic disease. Environ Mol Mutagen, 54(1), 1-7. doi:10.1002/em.21756; Lai, M. N., Wang, S. M., Chen, P. C., Chen, Y Y, &Wang, J. D. (2010). Population-based case-control study of Chineseherbal products containing aristolochic acid and urinary tract cancerrisk. J Natl Cancer Inst, 102(3), 179-186. doi: 10.1093/jnci/djp467;Taiwan Cancer Registry. (2017). Bureau of Health Promotion, Dept. ofHealth, Taiwan. The incidence of renal pelvic and ureteral tumor inTaiwan. Taiwan cancer registry. Retrieved Aug. 14, 2017, from URLcris.bhp.doh.gov.tw/pagepub/Home.aspx?itemNo=cr.q.10). AA is acarcinogenic and nephrotoxic nitrophenanthrene carboxylic acid producedby Aristolochia plants (Hsieh, S. C., Lin, I. H., Tseng, W. L., Lee, C.H., & Wang, J. D. (2008). Prescription profile of potentiallyaristolochic acid containing Chinese herbal products: an analysis ofNational Health Insurance data in Taiwan between 1997 and 2003, ChinMed, 3, 13. doi: 10.1186/1749-8546-3-13; National Toxicology Program.(2011). Aristolochic acids. Rep Carcinog, 12, 45-49). An etiologicallink between AA exposure and UTUC has been established in two distinctpopulations. The first resides in Balkan countries where Aristolochiaplants grow naturally in wheat fields (Jelakovic, B., Karanovic, S.,Vukovic-Lela, I., Miller, F., Edwards, K. L., Nikolic, J., Tomic, K.,Slade, N., Brdar, B., Turesky, R. J., Stipancic, Z., Dittrich, D.,Grollman, A. P., & Dickman, K. G. (2012). Aristolactam-DNA adducts are abiomarker of environmental exposure to aristolochic acid. Kidney Int,81(6), 559-567. doi: 10.1038/ki.2011.371). The second population is inAsia, where Aristolochia herbs are widely used in the practice ofTraditional Chinese Medicine (Grollman, 2013; National ToxicologyProgram, 2011). The public health threat posed by the medicinal use ofAristolochia herbs is exemplified by Taiwan, which has the highestincidence of UTUC in the world (Chen, C. H., Dickman, K. G., Moriya, M.,Zavadil, J., Sidorenko, V. S., Edwards, K. L., Gnatenko, D. V, Wu, L.,Turesky, R. J., Wu, X. R., Pu, Y S., & Grollman, A. P. (2012).Aristolochic acid-associated urothelial cancer in Taiwan. Proc Natl AcadSci USA, 109(21), 8241-8246. doi: 10.1073/pnas.1119920109; Yang, M. H.,Chen, K. K., Yen, C. C., Wang, W. S., Chang, Y H., Huang, W. J., Fan, F.S., Chiou, T. J., Liu, J. H., & Chen, P. M. (2002). Unusually highincidence of upper urinary tract urothelial carcinoma in Taiwan.Urology, 59(5), 681-687). More than one-third of the adult population inTaiwan has been prescribed herbal remedies containing AA (Hsieh, S. C.,Lin, I. H., Tseng, W. L., Lee, C. H., & Wang, J. D. (2008). Prescriptionprofile of potentially aristolochic acid containing Chinese herbalproducts: an analysis of National Health Insurance data in Taiwanbetween 1997 and 2003. Chin Med, 3, 13. doi: 10.1186/1749-8546-3-13),resulting in an unusually high (37%) proportion of UTUC cases relativeto all urothelial cancers (Taiwan Cancer Registry. (2017). Bureau ofHealth Promotion, Dept. of Health, Taiwan. The incidence of renal pelvicand ureteral tumor in Taiwan. Taiwan cancer registry. Retrieved Aug. 14,2017, from URL cris.bhp.doh.gov.tw/pagepub/Home.aspx?itemNo=cr.q.10).

Nephroureterectomy can be curative for patients with UTUC when it isdetected at an early stage (Li, C. C., Chang, T. H., Wu, W. J., Ke, H.L., Huang, S. P., Tsai, P. C., Chang, S. J., Shen, J. T., Chou, Y. H., &Huang, C. H. (2008). Significant predictive factors for prognosis ofprimary upper urinary tract cancer after radical nephroureterectomy inTaiwanese patients. Eur Urol, 54(5), 1127-1134. doi:10.1016/j.eururo.2008.01.054). However, these cancers are largely silentuntil the onset of overt clinical symptoms, typically hematuria, and asa result, most patients are diagnosed only at an advanced stage(Roupret, M., Babjuk, M., Comperat, E., Zigeuner, R., Sylvester, R. J.,Burger, M., Cowan, N. C., Bohle, A., Van Rhijn, B. W., Kaasinen, E.,Palou, J., & Shariat, S. F. (2015). European Association of UrologyGuidelines on Upper Urinary Tract Urothelial Cell Carcinoma: 2015Update. Eur Urol, 68(5), 868-879. doi: 10.1016/j.eururo.2015.06.044).Diagnostic tests for the detection of early-stage UTUC are not currentlyavailable. There is thus a need for clinical tools that can be used toidentify early UTUCs in populations at risk for developing this type ofmalignancy. Relapse following surgery is also a concern, as UTUC canrecur in the contralateral upper urinary tract and/or in the bladder(Roupret, M., Babjuk, M., Comperat, E., Zigeuner, R., Sylvester, R. J.,Burger, M., Cowan, N. C., Bohle, A., Van Rhijn, B. W., Kaasinen, E.,Palou, J., & Shariat, S. F. (2015). European Association of UrologyGuidelines on Upper Urinary Tract Urothelial Cell Carcinoma: 2015Update. Eur Urol, 68(5), 868-879. doi: 10.1016/j.eururo.2015.06.044;Soria, F., Shariat, S. F., Lerner, S. P., Fritsche, H. M., Rink, M.,Kassouf, W., Spiess, P. E., Lotan, Y., Ye, D., Fernandez, M. I.,Kikuchi, E., Chade, D. C., Babjuk, M., Grollman, A. P., & Thalmann, G.N. (2017). Epidemiology, diagnosis, preoperative evaluation andprognostic assessment of upper-tract urothelial carcinoma (UTUC). WorldJ Urol, 35(3), 379-387. doi: 10.1007/s00345-016-1928-x). Vigilantsurveillance for signs of malignancy is therefore an essential part offollow-up care in UTUC patients, and non-invasive tests for recurrentdisease could substantially improve post-surgical management,particularly as urine cytology cannot detect the majority of UTUCs(Baard, J., de Bruin, D. M., Zondervan, P. J., Kamphuis, G., de laRosette, J., & Laguna, M. P. (2017). Diagnostic dilemmas in patientswith upper tract urothelial carcinoma. Nat Rev Urol, 14(3), 181-191.doi: 10.1038/nruro1.2016.252).

SUMMARY

In general, methods and materials for identifying the presence of cancerin a subject with increased sensitivity and specificity as compared toconventional methods of identifying the presence of cancer in a subjectare provided herein. In some embodiments, methods provided herein foridentifying the presence of cancer in a subject with increasedsensitivity and specificity are performed on a liquid sample obtainedfrom the subject (e.g., blood, plasma, or serum), whereas conventionalmethods of identifying the presence of cancer in a subject do notachieve the level of sensitivity, the level of specificity, or both whenperformed on a liquid sample obtained from the subject. In someembodiments, methods provided herein for identifying the presence ofcancer in a subject with increased sensitivity and specificity areperformed prior to having determined that the subject already suffersfrom cancer, prior to having determined that the subject harbors acancer cell, and/or prior to the subject exhibiting symptoms associatedwith cancer. In some embodiments, methods provided herein foridentifying the presence of cancer in a subject with increasedsensitivity and specificity are used as a first line detection method,and not simply as a confirmation (e.g., an “overcall”) of anotherdetection method that the subject has cancer.

In some embodiments, provided herein are methods for identifying thepresence of pancreatic cancer in a subject that include: detecting in afirst biological sample obtained from the subject the presence of one ormore genetic biomarkers in one or more of the following genes: KRAS,TP53, CDKN2A, or SMAD4; detecting a level of one or more of thefollowing protein biomarkers in a second biological sample obtained fromthe subject: carbohydrate antigen 19-9 (CA19-9), carcinoembryonicantigen (CEA), hepatocyte growth factor (HGF), or osteopontin (OPN);comparing the detected levels of the one or more protein biomarker toone or more reference levels of the protein biomarkers; and identifyingthe presence of pancreatic cancer in the subject when the presence ofone or more genetic biomarkers is detected, the detected levels of theone or more protein biomarkers are higher than the reference levels ofthe one or more protein biomarkers, or both. In some of methods foridentifying the presence of pancreatic cancer in a subject, the firstbiological sample, the second biological sample, or both includesplasma. In some of methods for identifying the presence of pancreaticcancer in a subject, the first and second biological samples are thesame. In some of methods for identifying the presence of pancreaticcancer in a subject, the presence of one or more genetic biomarkers ineach of: KRAS, TP53, CDKN2A, and SMAD4 is detected. In some of methodsfor identifying the presence of pancreatic cancer in a subject, thelevel of each of carbohydrate antigen 19-9 (CA19-9), carcinoembryonicantigen (CEA), hepatocyte growth factor (HGF), and osteopontin (OPN) isdetected. In some of methods for identifying the presence of pancreaticcancer in a subject, the presence of one or more genetic biomarkers inone or more of KRAS, TP53, CDKN2A, or SMAD4 is detected using amultiplex PCR-based sequencing assay that includes a. assigning a uniqueidentifier (UID) to each of a plurality of template molecules present inthe sample; b. amplifying each uniquely tagged template molecule tocreate UID-families; and c. redundantly sequencing the amplificationproducts. In some of methods for identifying the presence of pancreaticcancer in a subject, detecting the presence of one or more geneticbiomarkers, detecting the level of one or more protein biomarkers, orboth is performed when the subject is not known to harbor a cancer cell.In some of methods for identifying the presence of pancreatic cancer ina subject, the subject is administered one or more therapeuticinterventions (e.g., surgery, adjuvant chemotherapy, neoadjuvantchemotherapy, radiation therapy, immunotherapy, targeted therapy, and/oran immune checkpoint inhibitor).

In some embodiments, provided herein are methods for identifying thepresence of cancer in a subject that include: detecting in a firstbiological sample obtained from the subject the presence of one or moregenetic biomarkers in one or more of the following genes: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, or GNAS; detecting a level of one or more of thefollowing protein biomarkers in a second biological sample obtained fromthe subject: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, or MPO;comparing the detected levels of the one or more protein biomarker toone or more reference levels of the protein biomarkers; and identifyingthe presence of cancer in the subject when the presence of one or moregenetic biomarkers is detected, the detected levels of the one or moreprotein biomarkers are higher than the reference levels of the one ormore protein biomarkers, or both. In some embodiments of methods foridentifying the presence of cancer in a subject, the first biologicalsample, the second biological sample, or both includes plasma. In someembodiments of methods for identifying the presence of cancer in asubject, the first and second biological samples are the same. In someembodiments of methods for identifying the presence of cancer in asubject, the presence of one or more genetic biomarkers in each of:NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS,KRAS, AKT1, TP53, PPP2R1A, and GNAS is detected. In some embodiments ofmethods for identifying the presence of cancer in a subject, the levelof each of CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, and MPO isdetected. In some embodiments of methods for identifying the presence ofcancer in a subject, the presence of one or more genetic biomarkers inone or more of NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A,PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, or GNAS is detected usinga multiplex PCR-based sequencing assay that includes: a. assigning aunique identifier (UID) to each of a plurality of template moleculespresent in the sample; b. amplifying each uniquely tagged templatemolecule to create UID-families; and c. redundantly sequencing theamplification products. In some embodiments of methods for identifyingthe presence of cancer in a subject, the cancer is liver cancer, ovarycancer, esophageal cancer, stomach cancer, pancreatic cancer, colorectalcancer, lung cancer, breast cancer, or prostate cancer. In someembodiments of methods for identifying the presence of cancer in asubject, the presence of one or more genetic biomarkers, detecting thelevel of one or more protein biomarkers, or both is performed when thesubject is not known to harbor a cancer cell. In some embodiments ofmethods for identifying the presence of cancer in a subject, the subjectis administered one or more therapeutic interventions (e.g., surgery,adjuvant chemotherapy, neoadjuvant chemotherapy, radiation therapy,immunotherapy, targeted therapy, and/or an immune checkpoint inhibitor).

In some embodiments, provided herein are methods for identifying thepresence of cancer in a subject that include: detecting in a firstbiological sample obtained from the subject the presence of one or moregenetic biomarkers in one or more of the following genes: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, or GNAS; detecting a level of one or more of thefollowing protein biomarkers in a second biological sample obtained fromthe subject: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, follistatin,G-CSF, or CA15-3; comparing the detected levels of the one or moreprotein biomarker to one or more reference levels of the proteinbiomarkers; and identifying the presence of cancer in the subject whenthe presence of one or more genetic biomarkers is detected, the detectedlevels of the one or more protein biomarkers are higher than thereference levels of the one or more protein biomarkers, or both. In someembodiments of methods for identifying the presence of cancer in asubject, the first biological sample, the second biological sample, orboth includes plasma. In some embodiments of methods for identifying thepresence of cancer in a subject, the first and second biological samplesare the same. In some embodiments of methods for identifying thepresence of cancer in a subject, the presence of one or more geneticbiomarkers in each of: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF,CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and GNAS isdetected. In some embodiments of methods for identifying the presence ofcancer in a subject, the level of each of CA19-9, CEA, HGF, OPN, CA125,AFP, prolactin, TIMP-1, follistatin, G-CSF, and CA15-3 is detected. Insome embodiments of methods for identifying the presence of cancer in asubject, the presence of one or more genetic biomarkers in one or moreof NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2,HRAS, KRAS, AKT1, TP53, PPP2R1A, or GNAS is detected using a multiplexPCR-based sequencing assay that includes: a. assigning a uniqueidentifier (UID) to each of a plurality of template molecules present inthe sample; b. amplifying each uniquely tagged template molecule tocreate UID-families; and c. redundantly sequencing the amplificationproducts. In some embodiments of methods for identifying the presence ofcancer in a subject, the cancer is liver cancer, ovary cancer,esophageal cancer, stomach cancer, pancreatic cancer, colorectal cancer,lung cancer, breast cancer, or prostate cancer. In some embodiments ofmethods for identifying the presence of cancer in a subject, thepresence of one or more genetic biomarkers, detecting the level of oneor more protein biomarkers, or both is performed when the subject is notknown to harbor a cancer cell. In some embodiments of methods foridentifying the presence of cancer in a subject, the subject isadministered one or more therapeutic interventions (e.g., surgery,adjuvant chemotherapy, neoadjuvant chemotherapy, radiation therapy,immunotherapy, targeted therapy, and/or an immune checkpoint inhibitor).

In some embodiments, provided herein are methods for identifying thepresence of cancer in a subject that include: detecting in a firstbiological sample obtained from the subject the presence of one or moregenetic biomarkers in one or more of the following genes: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, or GNAS; detecting a level of one or more of thefollowing protein biomarkers in a second biological sample obtained fromthe subject: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, orCA15-3; comparing the detected levels of the one or more proteinbiomarker to one or more reference levels of the protein biomarkers; andidentifying the presence of cancer in the subject when the presence ofone or more genetic biomarkers is detected, the detected levels of theone or more protein biomarkers are higher than the reference levels ofthe one or more protein biomarkers, or both. In some embodiments ofmethods for identifying the presence of cancer in a subject, the firstbiological sample, the second biological sample, or both includesplasma. In some embodiments of methods for identifying the presence ofcancer in a subject, the first and second biological samples are thesame. In some embodiments of methods for identifying the presence ofcancer in a subject, the presence of one or more genetic biomarkers ineach of: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN,FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and GNAS is detected. In someembodiments of methods for identifying the presence of cancer in asubject, the level of each of CA19-9, CEA, HGF, OPN, CA125, AFP,prolactin, TIMP-1, and CA15-3 is detected. In some embodiments ofmethods for identifying the presence of cancer in a subject, thepresence of one or more genetic biomarkers in one or more of NRAS,CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS,AKT1, TP53, PPP2R1A, or GNAS is detected using a multiplex PCR-basedsequencing assay that includes: a. assigning a unique identifier (UID)to each of a plurality of template molecules present in the sample; b.amplifying each uniquely tagged template molecule to createUID-families; and c. redundantly sequencing the amplification products.In some embodiments of methods for identifying the presence of cancer ina subject, the cancer is liver cancer, ovary cancer, esophageal cancer,stomach cancer, pancreatic cancer, colorectal cancer, lung cancer,breast cancer, or prostate cancer. In some embodiments of methods foridentifying the presence of cancer in a subject, the presence of one ormore genetic biomarkers, detecting the level of one or more proteinbiomarkers, or both is performed when the subject is not known to harbora cancer cell. In some embodiments of methods for identifying thepresence of cancer in a subject, the subject is administered one or moretherapeutic interventions (e.g., surgery, adjuvant chemotherapy,neoadjuvant chemotherapy, radiation therapy, immunotherapy, targetedtherapy, and/or an immune checkpoint inhibitor).

In some embodiments, provided herein are methods for identifying thepresence of bladder cancer or an upper tract urothelial carcinoma in asubject that include: detecting in a first biological sample obtainedfrom the subject the presence of one or more genetic biomarkers in oneor more of the following genes: TP53, PIK3CA, FGFR3, KRAS, ERBB2,CDKN2A, MLL, HRAS, MET, or VHL; detecting the presence of at least onemutation in a TERT promoter in a second biological sample obtained fromthe subject; and detecting the presence of aneuploidy in a thirdbiological sample obtained from the subject; and identifying thepresence of bladder cancer or an upper tract urothelial carcinoma in thesubject when the presence of one or more genetic biomarkers is detected,the presence of the at least one mutation in the TERT promoter, thepresence of aneuploidy is detected, or combinations thereof. In someembodiments of methods for identifying the presence of bladder cancer oran upper tract urothelial carcinoma in a subject, the first biologicalsample and the second biological sample are the same; the firstbiological sample and the third biological sample are the same; thesecond biological sample and the third biological sample are the same;or the first biological sample, the second biological sample, and thethird biological sample are the same. In some embodiments of methods foridentifying the presence of bladder cancer or an upper tract urothelialcarcinoma in a subject, the first biological sample, the secondbiological sample, or the third biological sample is a urine sample. Insome embodiments of methods for identifying the presence of bladdercancer or an upper tract urothelial carcinoma in a subject, the presenceof aneuploidy is detected on one or more of chromosome arms 5q, 8q, or9p. In some embodiments of methods for identifying the presence ofbladder cancer or an upper tract urothelial carcinoma in a subject, thepresence of one or more genetic biomarkers in each of: TP53, PIK3CA,FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and VHL is detected. In someembodiments of methods for identifying the presence of bladder cancer oran upper tract urothelial carcinoma in a subject, the presence of one ormore genetic biomarkers in one or more of TP53, PIK3CA, FGFR3, KRAS,ERBB2, CDKN2A, MLL, HRAS, MET, or VHL is detected using a multiplexPCR-based sequencing assay that comprises: a. assigning a uniqueidentifier (UID) to each of a plurality of template molecules present inthe sample; b. amplifying each uniquely tagged template molecule tocreate UID-families; and c. redundantly sequencing the amplificationproducts. In some embodiments of methods for identifying the presence ofbladder cancer or an upper tract urothelial carcinoma in a subject,detecting the presence of one or more genetic biomarkers, detecting thepresence of the at least one mutation in the TERT promoter, or detectingthe presence of aneuploidy is performed when the subject is not known toharbor a cancer cell. In some embodiments of methods for identifying thepresence of bladder cancer or an upper tract urothelial carcinoma in asubject, the subject is administered one or more therapeuticinterventions (e.g., surgery, adjuvant chemotherapy, neoadjuvantchemotherapy, radiation therapy, immunotherapy, targeted therapy, and/oran immune checkpoint inhibitor).

In some embodiments, provided herein are methods for identifying thepresence of ovarian or endometrial cancer in a subject that include:detecting in a first biological sample obtained from the subject thepresence of one or more genetic biomarkers in one or more of thefollowing genes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, orCDKN2A; detecting the presence of aneuploidy in a second biologicalsample obtained from the subject; and identifying the presence ofovarian or endometrial cancer in the subject when the presence of one ormore genetic biomarkers is detected, the presence of aneuploidy isdetected, or both. In some embodiments of methods for identifying thepresence of ovarian or endometrial cancer in a subject, the firstbiological sample and the second biological sample are the same. In someembodiments of methods for identifying the presence of ovarian orendometrial cancer in a subject, the first biological sample or thesecond biological sample is a cervical sample or an endometrial sample.In some embodiments of methods for identifying the presence of ovarianor endometrial cancer in a subject, the presence of aneuploidy isdetected on one or more of chromosome arms 4p, 7q, 8q, or 9q. In someembodiments of methods for identifying the presence of ovarian orendometrial cancer in a subject, the presence of one or more geneticbiomarkers in each of: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, andCDKN2A is detected. In some embodiments of methods for identifying thepresence of ovarian or endometrial cancer in a subject, the presence ofone or more genetic biomarkers in one or more of: NRAS, PTEN, FGFR2,KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7,PIK3R1, APC, EGFR, BRAF, or CDKN2A is detected using a multiplexPCR-based sequencing assay that comprises: a. assigning a uniqueidentifier (UID) to each of a plurality of template molecules present inthe sample; b. amplifying each uniquely tagged template molecule tocreate UID-families; and c. redundantly sequencing the amplificationproducts. In some embodiments of methods for identifying the presence ofovarian or endometrial cancer in a subject, the methods further includedetecting in a circulating tumor DNA (ctDNA) sample obtained from thesubject the presence of at least one genetic biomarker in one or more ofthe following genes: AKT1, APC, BRAF, CDKN2A, CTNNB1, EGFR, FBXW7,FGFR2, GNAS, HRAS, KRAS, NRAS, PIK3CA, PPP2R1A, PTEN, or TP53. In someembodiments of methods for identifying the presence of ovarian orendometrial cancer in a subject, the presence of one or more geneticbiomarkers or detecting the presence of aneuploidy is performed when thesubject is not known to harbor a cancer cell. In some embodiments ofmethods for identifying the presence of ovarian or endometrial cancer ina subject, the subject is administered one or more therapeuticinterventions (e.g., surgery, adjuvant chemotherapy, neoadjuvantchemotherapy, radiation therapy, immunotherapy, targeted therapy, and/oran immune checkpoint inhibitor).

Unless otherwise defined, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Methods and materials aredescribed herein for use in the present invention; other, suitablemethods and materials known in the art can also be used. The materials,methods, and examples are illustrative only and not intended to belimiting. All publications, patent applications, patents, sequences,database entries, and other references mentioned herein are incorporatedby reference in their entirety. In case of conflict, the presentspecification, including definitions, will control. Headers used invarious sections herein are not to be construed as limiting thedisclosure of that section to the topic of the header, nor as limitingthe disclosure of other sections to topics other than that of theheader. Such headers are exemplary, and are simply included for ease ofreading. Such headers are further not intended to restrict theapplicability or generality of that section to other parts of thisdisclosure.

Other features and advantages of the invention will be apparent from thefollowing detailed description and figures, and from the claims.

DESCRIPTION OF THE DRAWINGS

FIG. 1 contains a schematic overview showing a CancerSEEK test for thedetection and localization of cancers.

FIG. 2 contains graphs showing the development of a PCR-based assay toidentify tumor-specific mutations in plasma samples. Colored curvesindicate the proportion of cancers of the eight types evaluated in thisstudy that can be detected with an increasing number of short (<40 bp)amplicons. The sensitivity of detection increases with the number ofamplicons but plateaus at ˜60 amplicons. Colored dots indicate thefraction of cancers detected using the 61-amplicon panel used in 805cancers evaluated in our study, which averaged 82% (see main text).Publicly available sequencing data was obtained from the Catalog ofSomatic Mutations in Cancer (COSMIC) repository.

FIG. 3 contains a graph showing the distribution of the number ofdetectable mutations within the 805 primary tumors evaluated.

FIG. 4 contains graphs showing the performance of CancerSEEK. (A) Areceiver operator characteristic (ROC) curve for CancerSEEK. The redpoint on the curve indicate the test's average performance (61%) at >99%specificity. Error bars represent 95% confidence intervals forsensitivity and specificity at this particular point. The medianperformance among the 8 cancer types assessed was 70%, as noted in themain text. (B) Sensitivity of CancerSEEK by stage. Error bars representstandard errors of the median. (C) Sensitivity of CancerSEEK by tumortype. Error bars represent 95% confidence intervals.

FIG. 5 contains waterfall plots of the ctDNA and eight protein featuresused in CancerSEEK illustrate the separation between healthy donors andhealthy patients. Values are sorted from high (left) to low (right).Each column represents an individual patient sample (red, cancerpatient; blue, healthy control).

FIG. 6 contains a graph showing the principle component analysis of thectDNA and eight protein features used in CancerSEEK. Each dot representsan individual patient sample (red, cancer patient; blue, healthycontrol).

FIG. 7 contains graphs showing the effect of individual CancerSEEKfeatures on sensitivity. (A) Sensitivity of CancerSEEK by tumor type asin FIG. 4C. (B-J) Each panel displays the sensitivity achieved when aparticular CancerSEEK feature is excluded from the logistic regression.The difference in sensitivity relative to that achieved by CancerSEEKreflects the relative contribution of each biomarker to the performanceof the CancerSEEK test.

FIG. 8 contains a graph showing identification of cancer type bysupervised machine learning for patients classified by CancerSEEK aspositive. Percentages correspond to the proportion of patients correctlyclassified by one of the two most likely types (sum of light and darkblue bars) or the most likely type (light blue bar). Predictions for allpatients for all cancer types are provided in Table 6. Error barsrepresent 95% confidence intervals.

FIG. 9 contains graphs combining ctDNA KRAS mutations with proteinbiomarkers increases sensitivity for early detection of PDAC. (A)Sensitivities of ctDNA KRAS mutations alone, ctDNA KRAS mutations plusCA19-9, and ctDNA KRAS mutations with CA19-9 and other proteins(combination assay) with respect to AJCC stage. (B) Sensitivities ofctDNA KRAS mutations alone, ctDNA KRAS mutations plus CA19-9, and ctDNAKRAS mutations with CA19-9 and other proteins (combination assay) withrespect to tumor size. Error bars represent 95% confidence intervals.

FIG. 10 contains a diagram showing that combining ctDNA and proteinmarkers increases sensitivity because a large proportion of patients aredetected by only one marker. Number of patients detected by ctDNA KRASmutations (red circle), CA19-9 (green circle), and the three otherprotein biomarkers (blue circle), and combinations thereof (overlappingregions). Eighty patients (36% of the total) were not detectable by anyof the three makers.

FIG. 11 contains a graph showing that mutant allele frequencies (MAFs)of KRAS and TP53 mutations are strongly correlated (Pearson's r=0.885)in the plasma of the 12 patients whose plasma contained detectableamounts of both mutations, providing validation of the reliability ofthe ctDNA assay and its quantitative nature. Shaded region representsthe 95% confidence interval.

FIG. 12 contains a Kaplan-Meier survival plot of the 221 PDAC patientsincluded in this study stratified by AJCC stage (stage IA or IB: bluecurve, stage IIA or IIB: red curve).

FIG. 13 contains graphs showing correlations between triplex assaymarkers and tumor size. (A) KRAS mutations were found more frequently inlarger tumors than smaller tumors, but the mutant allele frequency didnot correlate with tumor size (Pearson's r=0.039). (B) In patients withelevated CA19-9, CA19-9 plasma concentration weakly correlates withtumor size (Pearson's r=0.287). (C-E) Plasma levels of CEA, HGF, and OPNwere less dependent on tumor size than KRAS mutations or CA19-9 (CEAPearson's r=0.153; HGF Pearson's r=0.037; OPN Pearson's r=0.018). Shadedregions represent 95% confidence intervals.

FIG. 14 contains graphs showing that levels of prolactin (G) and midkine(E) were significantly elevated in samples that were collected after theadministration of anesthesia but before surgical excision. In contrast,no difference was observed in the proportion of samples with mutant KRASctDNA (A), CA19-9 plasma concentration (B), CEA plasma concentration(C), HGF plasma concentration (D), and OPN plasma concentration betweensamples that were collected before or after the administration ofanesthesia. N.S. not significant, P>0.05 (Exact permutation t-test).

FIG. 15 contains a graph showing fold change in protein biomarker levelsfrom 29 pairs plasma samples collected before and immediately after theadministration of anesthesia. Of the six markers evaluated, onlyprolactin and midkine were found to be elevated by anesthesia, inperfect according with the correlation between collection site andprotein levels.

FIG. 16 contains Kaplan-Meier survival plots stratified by independentpredictors of overall survival identified by multivariate analysis: (A)combination assay status (HR=1.76, 95% CI, 1.10-2.84, p=0.018); (B)grade of differentiation (poorly differentiated, HR=1.72, 95% CI1.11-2.66, p=0.015); (C) lymphovascular invasion (present, HR=1.81, 95%CI 1.06-3.09, p=0.028); (D) nodal disease (present, HR=2.35, 95% CI1.20-4.61, p=0.013); (E) margin status (HR=1.59, 95% CI 1.01-2.55,p=0.050)

FIG. 17 contains receiver operator characteristic (ROC) curves for (A)KRAS mutations, (B) CA19-9, (C) CEA, (D) HGF, (E) OPN, and (F)Combination assay. (A-E) ROC curves demonstrate the performance of eachcombination assay biomarker individually. The red points on the curvesindicate the marker performance at the thresholds used in thecombination assay. Error bars represent 95% confidence intervals forsensitivity and specificity at the particular threshold (red font). (D)ROC curves demonstrating the performance of the combination assay whenthe KRAS threshold was varied and CA19-9, CEA, HGF, and OPN thresholdswere fixed at the levels used in the combination assay (black curve),the CA19-9 threshold was varied and KRAS, CEA, HGF, and OPN thresholdswere fixed at the levels used in the combination assay (red curve), theCEA threshold was varied and KRAS, CA19-9, HGF, and OPN thresholds werefixed at the levels used in the combination assay (blue curve), the HGFthreshold was varied and KRAS, CA19-9, CEA, and OPN thresholds werefixed at the levels used in the combination assay (green curve), and theOPN threshold was varied and KRAS, CA19-9, CEA, and HGF thresholds werefixed at the levels used in the combination assay (orange curve). Theintersection of these three curves designates the overall performance ofthe triplex assay (64% sensitivity, 99.5% sensitivity).

FIG. 18 shows performance of marker panel for identifying cancer in 8cancer types. (A) Numerical data. (B) Graphical data.

FIG. 19 contains a schematic of an exemplary PapSEEK test for thedetection of tumor DNA in the Pap brush, Tao brush, and plasma samplesof patients with endometrial or ovarian cancers. Tumor cells shed fromovarian or endometrial cancers are carried into the uterine cavity,where they can be collected by the Tao brush. The tumor cells that passdown into the endocervical canal can be captured by the Pap brush usedin the routine Pap test. These brushes are dipped into a liquidfixative, from which DNA is isolated and sequenced. The sequences areanalyzed for somatic mutations and aneuploidy. Additionally, tumor DNAshed into the bloodstream can be detected by ctDNA analysis.

FIG. 20 contains graphs showing detection of aneuploidy and somaticmutations (PapSEEK) in Pap brush (A) and Tao brush samples (B) fromhealthy controls and patients with endometrial and ovarian cancers.Error bars represent 95% confidence intervals.

FIG. 21 contains Venn diagrams showing that combined testing for somaticmutations and aneuploidy increased sensitivity for both ovarian andendometrial cancers, in the Pap (A) as well as the Tao brush (B)samples. For ovarian cancer, combined testing of Pap brush and plasmasamples also increased sensitivity compared to testing either sampletype alone (C).

FIG. 22 contains graphs showing detection of endometrial (A) or ovariancancers (B) in Pap or Tao brush samples with PapSEEK, by stage. Errorbars represent 95% confidence intervals.

FIG. 23 contains a graph showing detection of ovarian cancer in Pap andplasma samples. Error bars represent 95% confidence intervals.

FIG. 24 contains a graph showing detection of endometrial and ovariancancers with PapSEEK in the Pap brush, Tao brush, and plasma samples.Error bars represent 95% confidence intervals.

FIG. 25 contains a schematic drawing of an exemplary approach used toevaluate urinary cells in this study.

FIG. 26 contains a flow diagram indicating the number of patients in theEarly Detection Cohort and the Surveillance Cohort with summaries of thedata. Cytology was performed on a subset of the patients.

FIG. 27 contains graphs showing the fraction of mutations found in theten-gene panel in 231 urinary cell samples assessed in the EarlyDetection Cohort (A) and 132 urinary cell samples assessed in theSurveillance Cohort (B).

FIG. 28 contains Venn Diagrams of the distribution of samples that werepositive by each of the three assays for the Early Detection Cohort (A)and the Surveillance Cohort (B). URO=Ten gene panel, TERT=TERT promoterregion, ANEU=Aneuploidy test.

FIG. 29 contains bar graphs of the lead time between a positive UroSEEKtest and the detection of disease at the clinical level in the EarlyDetection Cohort (A) and the Surveillance Cohort (B).

FIG. 30 contains bar graphs showing the performance of cytology comparedto UroSEEK in diagnosis of low and high grade urothelial neoplasms inthe Early Detection Cohort and the Surveillance Cohort.

FIG. 31 contains a schematic diagram of an exemplary non-invasivedetection of upper tract urothelial cancer (UTUC) through geneticanalysis of urinary cell DNA. Upper urinary tract tumors arise in therenal pelvis and/or ureter and are in direct contact with urine. Urinecontains a mixture of normal cells that are constitutively shed fromvarious sites along the urinary system, along with malignant cells whenpresent (blue). The UroSEEK assay relies on mutational analyses of genesfrequently mutated in urinary cancers along with a determination ofchromosome losses and gains.

FIG. 32 contains a Venn diagram showing the distribution of positiveresults for each of the three UroSEEK assays.

FIG. 33 contains a graph showing a comparison of copy number variationsin matched tumor and urinary cell DNA samples from the UTUC cohort.Primary tumor is shown in the top of each section and urinary cell DNAon the bottom. Chromosome gains are in blue while losses are in red.Significance levels for gains and losses were set at Z scores>3 and <−3,respectively. X axis is Chromosome Arm. Y axis is Z score.

FIG. 34 contains a graph showing the fraction of total mutations foreach gene in the 10-gene panel used to analyze urinary cell DNA fromUTUC patients.

FIG. 35 contains graphs showing comparisons of copy number variations inmatched tumor and urinary cell DNA samples from four individual UTUCpatients (FIGS. 35A-D). Z-scores>3 or <−3 were considered as significantfor chromosome gains or losses, respectively. N.S. indicates notsignificant. Data for all 56 patients are provided in Table 28.

FIG. 36 contains a schematic showing an overview of an exemplary WALDOapproach. (A) A single primer pair amplifies ˜38,000 long interspersednucleotide elements (LINEs). (B) A test sample is matched to seveneuploid samples with genomic DNA of similar size. (C) The genome isdivided into 4361 intervals, each of 500-kb in size. (D) The readswithin these 500-kb genomic intervals in the euploid samples are groupedinto 4361 clusters. All the 500-kb genomic intervals in the clustershave similar read depths. (E) The reads from each of the 500-kb genomicintervals in the test sample are placed into the pre-defined clusters.(F) Statistical tests, including a Support Vector Machine (SVM)-basedalgorithm, are used to determine whether the total reads from all the500-kb genomic intervals on each chromosome arm are distributed asexpected if the sample was euploid. The statistical tests are based onthe observed distribution of reads within the clusters of the testsample, not by comparison to the reads in euploid samples. (G) Germlinesequence variants at sites of known common polymorphisms within theLINEs provide information about arm-level allelic imbalance that canalso be used to assess aneuploidy of individual chromosome arms. Thesesame polymorphisms can be used to determine whether any two samples arederived from the same individual. (H) When there is a matched normalsample from the same individual available, WALDO can detect the numberand nature of single base substitutions and insertions and deletionswithin the LINEs.

FIG. 37 contains a graph showing individual chromosome arm gains andlosses that were identified in nine cancer types. The average fractionof tumors with a gain or loss in each chromosome arm are depicted in thefigure. The same nine tumor types were analyzed in both cohorts, butthere was no overlap between the samples assessed by WALDO (red) orGISTIC (blue). WALDO employed the data from LINE sequencing of tumorsreported here while GISTIC employed the data from Affymetrix SNP6.0arrays provided by the TCGA.

FIG. 38 shows aneuploidy detected in plasma samples from cancerpatients. Receiver operating characteristics (ROC) and area under thecurve (AUC) are shown for three ranges of neoplastic cell fractions.True positives were defined as those samples from cancer patientsscoring positive while false positives were defined as those from normalindividuals scoring positive. The neoplastic cell fraction of eachplasma samples was estimated from driver gene sequencing data asdescribed in the text. (A) Samples with neoplastic cell fractions<0.5%.(B) Samples with neoplastic cell fractions ranging from 0.5-1%. (C)Samples with neoplastic cell fractions>1%.

FIG. 39 contains graphs showing aneuploidy correlation comparisons ofcancers detected using FAST-SeqS and WALDO compared to The Cancer GenomeAtlas (TCGA) using Affymetrix SNP6.0 and GISTIC across 9 differentcancer types. (A) Correlation of the fraction of chromosome arm gains.(B) Correlation of the fraction of chromosome arm losses.

FIG. 40 contains graphs showing aneuploidy comparisons of individualcancer types that were detected using the WALDO framework compared toThe Cancer Genome Atlas (TCGA). For each cancer type, the fraction ofeach chromosome arms gained and lost were compared. The correlation ofthese gains and losses in WALDO to TCGA was compared. Each sub-figurerepresents a different cancer type. (A) Breast invasive carcinoma(BRCA). (B) Colon adenocarcinoma and rectum adenocarcinoma (COAD;COADREAD). (C) Esophageal carcinoma (ESCA). (D) Head and neck squamouscell carcinoma (HNSC). (E) Liver hepatocellular carcinoma (LIHC). (F)Pancreatic adenocarcinoma (PAAD). (G) Ovarian serous cystadenocarcinoma(OV). (H) stomach adenocarcinoma (STAD). (I) Uterine corpus endometrialcarcinoma (UCEC).

FIG. 41 contains graphs showing Trisomy 21 performance as a function ofread depth. DNA samples from individuals with trisomies were physicallymixed at a ratio of 2 ng of normal DNA and ˜0.2 ng of Trisomy 21 DNA andnormal peripheral white blood cell (WBC) samples. The mixtures werecreated to replicate typical fetal fractions in noninvasive prenataltesting (approximately 10%). Using polymorphisms in the LINE-amplicons,the trisomy admixture of the samples was estimated to range from 7.7% to10.4%). Using a z threshold of 2.5, sensitivities (A) and specificities(B) were calculated for a range of read depths.

FIG. 42 contains a graph showing a comparison of the total number ofsomatic single base substitutions (SBS) that were detected in ExomeSequence vs WALDO.

FIG. 43 contains a graph showing a comparison of the percentages ofsingle base substitution mutations that are A:T>T:A Mutations that weredetected via Exome sequencing vs WALDO.

FIG. 44 contains graphs showing the spectrums of single basesubstitution (SBS) mutations. (A) SBS identified by WALDO. (B) SBSidentified by exome sequencing.

FIG. 45 contains a graph showing a distribution of the number of genomicintervals included in a cluster for a representative normal WBC sample.

FIG. 46 contains graphs showing distributions of scaled reads. (A)Distribution of scaled reads illustrating that reads in FAST-SeqSamplicon sequencing were not randomly distributed. (B) Representativecluster for a normal WBC sample illustrating the normality of the scaledreads in a cluster. (C) Representative cluster for an aneuploid primarytumor sample illustrating the normality of the scaled reads in acluster.

FIG. 47 contains graphs showing an example of the statistical procedureto identify a chromosome arm gain or loss.

FIG. 48 contains a graph showing an empirical estimation of the varianceof the B-allele frequency for heterozygous SNPs as a function of readdepth. Increasing UID depth improved the estimation of the B-allelefrequency for heterozygous SNPs.

FIG. 49 shows an exemplary pseudocode to generate synthetics with onearm alteration.

FIG. 50 shows an exemplary pseudocode to generate synthetics withmultiple arm alterations.

FIG. 51 contains a graph showing a distribution of genome wideaneuploidy scores (SVM Scores) as a function of read depth. Lower readdepth was more likely to produce higher scores, and failing to correctfor UID depth, can produce false positives.

FIG. 52 contains a schematic showing an exemplary overview of abottleneck sequencing methodology. Each color at the top of the figurerepresents double-stranded DNA from a genome of one cell within apopulation. Random, nonclonal point mutations (red) are private toindividual cells. In contrast, clonal reference changes (A in black) arepresent in all genomes within the cell population. (step 1) Randomshearing generates variably sized DNA molecules. (step 2)Noncomplementary single-stranded regions of the Illumina Y-adapters (P5in gray and P7 in black) are represented as forked structures ligated toboth ends of each DNA molecule. (step 3) Dilution decreases the numberof DNA molecules (five are shown) from the original population in arandom manner. Ends of the DNA molecules align uniquely to the referencegenome. Mapping coordinates are used as unique molecule “barcodes”during data processing. (step 4) PCR primer (black arrowhead) annealsand primer extends (hashed lines) the Watson and Crick template of theoriginal DNA molecule independently. The red asterisk represents anerror generated during PCR of the library. (step 5) Watson and Cricktemplates generate two families of PCR duplicates. Orientation of P5(gray) and P7 (black) containing adapters to the DNA molecule (insert)distinguishes the two families. P5 and P7 sequences dictate which endwill be sequenced in read 1 vs. read 2, respectively, on the Illuminaflow cell. Red asterisks represent the PCR error propagated in theWatson but not the Crick family members. In contrast to artifacts, realmutations (C:G mutation in red) will be present in both the Watson andCrick family members. (step 6) The BotSeqS pipeline identifies andquantifies the number of unique DNA molecules and point mutations (C:Gin red) in the sequencing data by eliminating artifacts and clonalchanges (A:T in black).

FIG. 53 contains graphs showing nuclear point mutations increase innormal tissues from individuals with defects in DNA repair or withexposure to environmental carcinogens compared with controls. (A)Comparison of point mutation prevalences in nuclear (Left) andmitochondrial (Right) genome in age-matched normal colon epithelium(filled circle) with different DNA mismatch repair genotypes (PMS2^(+/+)or PMS2^(−/−)) or in age-matched normal kidney cortex (filled square)without (none) or with (aristolochic acid or smoking) carcinogenexposure. Red lines represent average. *P<0.05, t test; **P<0.001 and***P<0.0001, one-way ANOVA with Bonferroni multiple comparison posttest;ns, not significant, indicates P>0.05. (B) Stacked columns representingthe substitution frequencies (y axis) of each substitution out of thesix possible types (see legend). Cohort labels are indicated in Adirectly above each column. Number of substitutions (N) generating eachmutational spectrum is indicated on the x axis. n.d., not determined dueto an insufficient number of mutations (N=7) for mutational spectrumanalysis. *P=0.04, Fisher's exact test; **P=2.6×10⁻⁸ and ***P=1.5×10⁻¹⁶,Fisher's exact test with Bonferroni multiple comparison correction; ns,not significant, indicates P>0.05. All statistical tests in this figurewere two-tailed.

FIG. 54 contains graphs showing normal human tissues accumulate pointmutations over a lifetime with genome-specific and tissue-specificmutational patterns. Point mutation prevalences in nuclear (Top) andmitochondrial (Bottom) genome measured in four normal tissue types(brain frontal cortex of 9 individuals, kidney cortex of 5 individuals,colon epithelium of 11 individuals, and duodenum of 1 individual).Twenty-six total individuals were assessed, with each individualcontributing to one normal tissue type. Pie chart Insets show theprevalences of each substitution out of the six possible substitutiontypes (see pie chart legend, right side). Each pie chart was compiledfrom the individuals represented in their respective scatter plots, withthe exception that duodenum was omitted. The number of substitutionsgenerating the pie charts for the nuclear genome was n=31 for brain,n=73 for kidney, and n=94 for colon, and for the mitochondrial genomewas n=181 for brain, n=299 for kidney, and n=116 for colon.

FIG. 55 contains an assessment of duplicate counts with MiSeg™ prerun.Histograms showing the distribution of family members (PCR duplicatesfrom individual template molecules, shown on the x-axis). Either two orthree serial dilutions (103, 104, 105, or 106) were evaluated on theMiSeg™ for six samples (COL373, SA 117, KID038, BRA01, BRA04, BRA07) togenerate ˜5 M properly paired reads per library. Family member countswere determined here using Picard's Estimate Library Complexity program.Libraries generated from the 105 dilution (blue) were subsequently usedfor the final HiSeq™ run reported in this study. Note that the HiSeq™distribution is expected to shift to the right compared to the MiSeg™distribution due to the increase of clusters sequenced per library (˜5 Mclusters scaled to ˜70 M clusters). For example, the BotSeqS librariesfrom the 106 dilution (red) were not used because the members per familywould be too high on a HiSeq™ run, limiting the number of differentfamilies that could be evaluated with a given amount of sequencing.

FIG. 56 contains a graph showing family member counts of 44 BotSeqSlibraries reported in this study. Horizontal box and whisker plots for44 BotSeqS libraries (y-axis) and number of members per family(duplicate count, x-axis). White boxes represent the first to thirdquartile range with the hash mark indicating the median. Whiskersrepresent 1.5*IQR (interquartile range) and data points outside thewhiskers are shown as outliers. An average of 3.97 M (range 0.38 to10.91 M) unfiltered families per library were assessed. Families wereidentified through the BotSeqS pipeline using the genomic mappingcoordinates as unique molecule identifiers. Names in blue indicatetechnical replicate samples. Note that Bot01-Bot06 and Bot23-28 wereperformed on the same samples with a 100-fold difference in dilution(see Table 43).

FIG. 57 contains graphs showing Consideration of both Watson and Crickfamily members decreases artifacts, specifically G>T transversions. (A)Nuclear point mutation frequencies (y-axis) considering mutationsobserved in “Watson AND/OR Crick” (black circle) or “Watson AND Crick”families (black square) in normal tissues derived from brain frontalcortex (left side), kidney cortex (middle, shaded), or colon epithelium(right side). Specifically, “OR” mutations represent ≥90% mutationfraction in Watson family with a minimum of two Watson reads or ≥90%mutation fraction Crick family with a minimum of two Crick reads. Notethat the “OR” mutations have only the Watson or Crick familiesrepresented in the data but not both. “AND” mutations represent ≥90%mutation fraction in Watson family with a minimum of two Watson readsand ≥90% mutation fraction Crick family with a minimum of two Crickreads. “AND” mutations are an internal subset of the “AND/OR” dataset,which is a modified version of the BotSeqS pipeline. Twenty-fiveindividuals are organized by increasing age within each tissue. (B) Piecharts of the frequencies of each nuclear substitution out of the sixpossible substitution types (see legend) from (a) considering WatsonAND/OR Crick (top pies) or Watson AND Crick (bottom pies) in each normaltissue type. The number of nuclear mutations generating mutationalspectra for Watson AND/OR Crick was n=616 for brain, n=1,257 for kidney,n=2,542 for colon and for Watson AND Crick was n=33 for brain, n=74 forkidney, n=99 for colon.

FIG. 58 contains graphs showing rare point mutations accumulate innormal tissues of the colon more than in brain. Point mutation frequency(y-axis) in nuclear (top graph) and mitochondrial (bottom graph) genomein normal brain frontal cortex (left side) and normal colon epithelium(right side) grouped by age (young infant/child in green, young adult inpurple, old adult in blue). Averages of each age cohort are shown witherror bars representing the standard deviations. Two-way ANOVA withBonferroni multiple comparison post-test was performed using GraphPadPrism™ 5.0f software with P values reported above bars. n.s. (notsignificant) indicates P>0.05. For brain, the number of individuals andaverage age of group are as follows-infant/child: n=3, 3.5 years old(y/o) (BRA01, BRA02, BRA03); young adult: n=3 individuals, 22 y/o(BRA04, BRA05, BRA06); and old adult: n=3, 93 y/o (BRA07, BRA08, BRA09).For colon, infant/child: n=2, 5.5 y/o (COL229, COL231); young adult:n=6, 28 (COL235, COL236, COL237, COL373, COL374, COL375); old adult:n=3, 96 y/o (COL232, COL233, COL234).

FIG. 59 contains a graph showing Mitochondrial and nuclear pointmutation frequencies in normal tissues from the same individual. Datapoints represent the ratio between mitochondrial to nuclear pointmutation frequencies (y-axis) within the normal tissue of the sameindividual. Individuals were grouped into four cohorts (x-axis) withn=24 individuals for Control (see Table 51), n=2 individuals (COL238,COL239) for DNA repair defect PMS2−/−, n=3 individuals (AA_105, AA_124,AA_126) for aristolochic acid exposure, and n=3 individuals (SA_117,SA_118, SA_119) for smoking exposure. One ratio from the control cohort(COL229) was zero and omitted from this analysis. Average (red line)ratio for each cohort is 24.5 for Control, 0.5 for DNA repair defectPMS2−/−, 1.1 for aristolochic acid exposure, and 2.0 for smokingexposure. *P<0.05, **P<0.01, one-way ANOVA with Bonferroni multiplecomparison post-test.

FIG. 60 contains graphs showing Normal tissues and tumors derived fromthe same tissue type have similar mutation spectra. (A) Pie charts ofnuclear and mitochondrial frequencies of each substitution out of thesix possible substitution types (see legend) comparing normal (leftside) and tumors (right side) derived from colon (top) and kidney(bottom). “Normal” represents the rare mutational spectra data derivedfrom normal tissues shown in FIG. 54. “Nuclear tumor mutations”represent clonal mutation data from colorectal carcinomas (COAD/READ) orclear cell renal carcinoma (KIRC) from the TCGA dataset #!Synapse:syn1729383 found at the website of synapse.org. “mtDNA tumormutations” from colon and kidney were acquired from “colorectal” and“renal” tumor types in supplementary file 2 of Ju et al. (2014 eLife 3).For normal tissues, the number of substitutions assessed was as follows:colon nuclear n=94 from 13 individuals, colon mtDNA n=116 from 12individuals, kidney nuclear n=73 from 7 individuals, and kidney mtDNAn=299 from five individuals. For tumor tissue, the number ofsubstitutions assessed was as follows: colorectal carcinoma nuclearn=18,538 from 193 individuals, colorectal carcinoma mtDNA n=64 from 76individuals, clear cell renal cell carcinoma nuclear n=24,559 from 417individuals, and renal carcinoma mtDNA n=16 from 23 individuals. (B)Principal component analysis (PCA) of mutational spectra from thecohorts indicated in (A). PCA performed and graphed using R software.

FIG. 61 contains a schematic showing elements of Safe-SeqS. In the firststep, each fragment to be analyzed is assigned a unique identification(UID) sequence (metal hatch or stippled bars). In the second step, theuniquely tagged fragments are amplified, producing UID-families, eachmember of which has the same UID. A super-mutant is defined as aUID-family in which ≥95% of family members have the same mutation.

FIG. 62 contains a schematic showing an exemplary Safe-SeqS withendogenous UIDs plus capture. The sequences of the ends of each fragmentproduced by random shearing (variously shaded bars) serve as the uniqueidentifiers (UIDs). These fragments are ligated to adapters (earthhatched and cross hatched bars) so they can subsequently be amplified byPCR. One uniquely identifiable fragment is produced from each strand ofthe double-stranded template; only one strand is shown. Fragments ofinterest are captured on a solid phase containing oligonucleotidescomplementary to the sequences of interest. Following PCR amplificationto produce UID-families with primers containing 5′ “grafting” sequences(adhesive filled and light stippled bars), sequencing is performed andsuper-mutants are defined as in FIG. 61.

FIG. 63 contains a schematic showing an exemplary Safe-SeqS withexogenous UIDs. DNA (sheared or unsheared) is amplified with a set ofgene-specific primers. One of the primers has a random DNA sequence(e.g., a set of 14 N's) that forms the unique identifier (UID; variouslyshaded bars), located 5′ to its gene-specific sequence, and both havesequences that permit universal amplification in the next step (earthhatched and cross hatched bars). Two UID assignment cycles produce twofragments—each with a different UID—from each double-stranded templatemolecule, as shown. Subsequent PCR with universal primers, which alsocontain “grafting” sequences (adhesive filled and light stippled bars),produces UID-families which are directly sequenced. Super-mutants aredefined as in the legend to FIG. 61.

FIG. 64 contains graphs showing single base substitutions identified byconventional and Safe-SeqS analysis. The exogenous UID strategy depictedin FIG. 63 was used to produce PCR fragments from the CTNNB1 gene ofthree normal, unrelated individuals. Each position represents one of 87possible single base substitutions (3 possible substitutions/base×29bases analyzed). These fragments were sequenced on an Illumina GA IIxinstrument and analyzed in the conventional manner (A) or with Safe-SeqS(B). Safe-SeqS results are displayed on the same scale as conventionalanalysis for direct comparison; the inset is a magnified view. Note thatmost of the variants identified by conventional analysis are likely torepresent sequencing errors, as indicated by their high frequencyrelative to Safe-SeqS and their consistency among unrelated samples.

FIG. 65 contains a schematic showing an exemplary Safe-SeqS withendogenous UIDs plus inverse PCR. The sequence of the ends of eachfragment produced by random shearing serve as unique identifiers (UIDs;variously shaded bars). These fragments are ligated to adapters (earthhatched and cross hatched bars) as in a standard Illumina librarypreparation. One uniquely tagged fragment is produced from each strandof the double-stranded template; only one strand is shown. Followingcircularization with a ligase, inverse PCR is performed withgene-specific primers that also contain 5′ “grafting” sequences(adhesive filled and lightly stippled bars). This PCR producesUID-families which are directly sequenced. Super-mutants are defined asin FIG. 61.

FIG. 66 contains graphs showing single base substitutions position vs.error frequency in oligonucleotides synthesized with phosphoramiditesand Phusion. A representative portion of the same 31-base DNA fragmentsynthesized with phosphoramidites (A) or Phusion polymerase (B) wasanalyzed by Safe-SeqS. The means and standard deviations for sevenindependent experiments of each type are plotted. There was an averageof 1,721±383 and 196±143 SBS super-mutants identified in thephosphoramidite-synthesized and Phusion-generated fragments,respectively. The y-axis indicates the fraction of the total errors atthe indicated position. Note that the errors in thephosphoramidite-synthesized DNA fragment were consistent among the sevenreplicates, as would be expected if the errors were systematicallyintroduced during the synthesis itself. In contrast, the errors in thePhusion-generated fragments appeared to be heterogeneous among samples,as expected from a stochastic process (Luria and Delbruck, 1943 Genetics28:491-511).

FIG. 67 contains a graph showing UID-family member distribution. Theexogenous UID strategy depicted in FIG. 63 was used to produce PCRfragments from a region of CTNNB1 from three normal, unrelatedindividuals (Table 53); a representative example of the UID-familieswith <300 members (99% of total UID-families) generated from oneindividual is shown. The y-axis indicates the number of differentUID-families that contained the number of family members shown on thex-axis.

FIG. 68A-K contains an exemplary Random Forest model tree forclassification of tumor location. FIG. 68A shows the complete tree, andFIGS. 68B-68K contain magnified images of sections of the complete treeas indicated in FIG. 68A.

FIG. 69 contains exemplary rules for tissue recognition extracted fromthe random forest model. The randomForest function from the randomForestpackage (v4.6-14) was applied to protein data from the CancerSEEKproject. The protein data have 33 proteins and the 626 tumor samplesthat were predicted correctly as cancer by CancerSEEK. The values ofeach protein were set to zero if they were less than the 25th quantileof values in the normal samples. To get the specific decision rules(Table 58), the inTrees package (v1.2) was applied to extract all rulesof length less than or equal to 6 from all 500 trees created byrandomForest. From this set of rules, using the functions(selectRuleRRF, buildLearner, applyLearner) from the inTrees package,relevant and non-redundant rules were selected, a classifier was createdand applied to the data, and the final list of rules was extracted. Thisfinal list of rules performs similarly to the full random forest and itrepresents a good approximation of the full forest.

DETAILED DESCRIPTION Definitions

As used herein, the word “a” before a noun represents one or more of theparticular noun. For example, the phrase “a genetic alteration”encompasses “one or more genetic alterations.”

As used herein, the term “about” means approximately, in the region of,roughly, or around. When used in conjunction with a numerical range, theterm “about” modifies that range by extending the boundaries above andbelow the numerical values set forth. In general, the term “about” isused herein to modify a numerical value above and below the stated valueby a variance of 10%.

As used herein, the term “aneuploidy” refers to the condition of havingless than or more than the natural diploid number of chromosomes, or anydeviation from euploidy.

As used herein in the context of circulating tumor DNA or cell-free DNA,the phrase “derived from a gene” means that the circulating tumor DNA isshed from tumor cells (e.g., tumor cells that have lysed or otherwisedied). For example, circulating tumor DNA “derived from a KRAS gene”means that the circulating tumor DNA was originally present in a tumorcell. When detecting a mutation that is present in circulating tumor DNAderived from a gene, it is not necessary to have first identified themutation in the tumor cell itself.

The term “driver gene mutation” or “driver mutation” as used herein,refers to a mutation that (i) occurs in a driver gene; and (ii) providesa growth advantage to the cell in which it occurs. A growth advantagefor a cell can include:

a) an increase in the rate of cell division in a cell having a drivergene mutation, e.g., an increase in rate of cell division as compared toa reference cell, e.g., to an otherwise similar cell, e.g., an otherwisesimilar cell adjacent to the cell, e.g., as compared to a cell of thesame type not having the driver gene mutation;

b) an increase in the rate of clonal expansion in a cell having a drivergene mutation, e.g., an increase in rate of clonal expansion as comparedto a reference cell, e.g., to an otherwise similar cell, e.g., anotherwise similar cell adjacent to the cell, e.g., as compared to a cellof the same type not having the driver mutation;

c) an increase in the number of cells that are progeny, e.g., a daughtercell, of the cell that has the driver gene mutation, e.g., an increasein number of progeny cells compared to the number of progeny cellsexpected if the cell did not have the driver gene mutation;

d) an increase in the ability to form tumors or promote tumor growth,e.g., tumor progression, e.g., as compared to a reference cell, e.g., toan otherwise similar cell not having the driver gene mutation; or

e) presence or appearance at a second or subsequent site or location inthe subject.

In an embodiment, a driver gene mutation provides a 0.1-5%, e.g., a0.1-4.5%, 0.1-4%, 0.1-3.5%, 0.1-3%, 0.1-2.5%, 0.1-2%, 0.1-1.5%, 0.1-1%,0.1-0.5%, 0.5-5%, 1-5%, 1.5-5%, 2-5%, 2.5-5%, 3-5%, 3.5-5%, 4-5%,4.5-5%, 0.5-4.5%, 1-4%, 1.5-3.5%, or 2-3%, growth advantage, e.g.,increase in the difference between cell birth and cell death. In anembodiment, a driver gene mutation provides at least 0.1% 0.2%, 0.3%,0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, or4.5%, e.g., about a 0.4%, growth advantage, e.g., increase in thedifference between cell birth and cell death. In an embodiment, a drivergene mutation, provides a proliferative capacity to the cell in which itoccurs, e.g., allows for cell expansion, e.g., clonal expansion.

In some embodiments, the driver gene mutation can be causally linked tocancer progression.

In an embodiment, the driver gene mutation affects, e.g., alters theregulation, expression or function of, a protein coding gene. In anembodiment, a driver gene mutation affects, e.g., alters the functionof, a noncoding region, e.g., non-protein coding region. In anembodiment, a driver gene mutation includes: a translocation, a deletion(e.g., a homozygous deletion), an insertion (e.g., an intragenicinsertion), a small insertion and deletion (indels), a single basesubstitution (e.g., a synonymous mutation, non-synonymous mutation,nonsense mutation or a frameshift mutation), a copy number variation(CNV) (e.g., an amplification), or a single nucleotide variation (SNV)(e.g., a single nucleotide polymorphism (SNP)). Exemplary drivermutations can be found in Tables 60 and 61.

In some embodiments, the presence of a driver gene mutation in a cellcan alter (e.g., increase or decrease) the expression of the geneproduct in that cell. In some embodiments, the presence of a driver genemutation in a cell can alter the function of the gene product. In somecases, the presence of a driver gene mutation in a cell can provide thatcell with a growth advantage. For example, the presence of a driver genemutation in a cell can cause an increase the rate of proliferation(e.g., as compared to a reference cell). For example, the presence of adriver gene mutation in a cell can cause an increase in the rate ofclonal expansion in a cell having a driver gene mutation (e.g., ascompared to a reference cell). For example, the presence of a drivergene mutation in a cell can cause an increase in the number of progenycells derived from the cell having the driver gene mutation (e.g., ascompared to a reference cell). For example, the presence of a drivergene mutation in a cell can cause an increase in the ability of the cellto form a tumor (e.g., as compared to a reference cell). In some cases,a growth advantage can be measures as an increase in the differencebetween cytogenesis (e.g., the formation of new cells) and cell death.For example, the presence of a driver gene mutation in a cell canprovide that cell with a growth advantage of at least about 0.1% (e.g.,about 0.2%, about 0.3%, about 0.4%, about 0.5%, about 0.6%, about 0.7%,about 0.8%, about 0.9%, about 1%, about 1.5%, about 2%, about 2.5%,about 3%, about 3.5%, about 4%, about 4.5%, or more). For example, thepresence of a driver gene mutation in a cell can provide that cell witha growth advantage of about from 0.1% to about 5% (e.g., from about 0.1to about 5%, from about 0.1 to about 4.5%, from about 0.1 to about 4%,from about 0.1 to about 3.5%, from about 0.1 to about 3%, from about 0.1to about 2.5%, from about 0.1 to about 2%, from about 0.1 to about 1.5%,from about 0.1 to about 1%, from about 0.1 to about 0.5%, from about 0.5to about 5%, from about 1 to about 5%, from about 1.5 to about 5%, fromabout 2 to about 5%, from about 2.5 to about 5%, from about 3 to about5%, from about 3.5 to about 5%, from about 4 to about 5%, from about 4.5to about 5%, from about 0.5 to about 4.5%, from about 1 to about 4%,from about 1.5 to about 3.5%, or from about 2 to about 3%).

In some cases, a driver gene can include more than one (e.g., two,three, four, five, six, seven, eight, nine, ten, or more) driver genemutations. In some cases, a driver gene including one or more drivergene mutations also can include one or more additional mutations (e.g.,passenger gene mutations (somatic mutations which are not a drivermutation)).

The term “driver gene” as used herein, refers to a gene which includes adriver gene mutation. In one embodiment, the driver gene is a gene inwhich one or more (e.g., one, two, three, four, five, six, seven, eight,nine, ten, or more) acquired mutations, e.g., driver gene mutations, canbe causally linked to cancer progression. In an embodiment, a drivergene modulates one or more cellular processes including: cell fatedetermination, cell survival and genome maintenance. A driver gene canbe associated with (e.g., can modulate) one or more signaling pathways.Examples of signaling pathways include, without limitation, a TGF-betapathway, a MAPK pathway, a STAT pathway, a PI3K pathway, a RAS pathway,a cell cycle pathway, a apoptosis pathway, a NOTCH pathway, a Hedgehog(HH) pathway, a APC pathway, a chromatin modification pathway, atranscriptional regulation pathway, and a DNA damage control pathway.Examples of driver genes include, without limitation, ABL1, ACVR1B,AKT1, ALK, APC, AR, ARID1A, ARID1B, ARID2, ASXL1, ATM, ATRX, AXIN1, B2M,BAP1, BCL2, BCOR, BRAF, BRCA1, BRCA2, CARD11, CASP8, CBL, CDC73, CDH1,CDKN2A, CEBPA, CIC, CREBBP, CRLF2, CSF1R, CTNNB1, CPLD, DAXX, DNMT1,DNMT3A, EGFR, EP300, ERBB2, EZH2, FAM123B, FBXW7, FGFR2, FGFR3, FLT3,FOXL2, FUBP1, GATA1, GATA2, GATA3, GNA11, GNAQ, GNAS, H3F3A, HIST1H3B,HNF1A, HRAS, IDH1, IDH2, JAK1, JAK2, JAK3, KDMSC, KDM6A, KIT, KLF4,KRAS, MAP2K1, MAP3K1, MED12, MEN1, MET, MLH1, MLL2, MLL3, MPL, MSH2,MSH6, MYD88, NCOR1, NF1, NF2, NFE2L2, NOTCH1, NOTCH2, NPM1, NRAS, PAX5,PBRM1, PDGFRA, PHF6, PIK3CA, PIK3R1, PPP2R1A, PRDM1, PTCH1, PTEN,PTPN11, RB1, RET, RNF43, RUNX1, SETD2, SETBP1, SF3B1, SMAD2, SMAD4,SMARCA4, SMARCB1, SMO, SOCS1, SOX9, SPOP, SRSF2, STAG2, STK11, TET2,TNFAIP3, TRAF7, TP53, TSC1, TSHR, U2AF1, VHL, WT1, CCND1, CDKN2C, IKZF1,LMO1, MAP2K4, MDM2, MDM4, MYC, MYCL1, MYCN, NCOA3, NKX2-1, and SKP2.Exemplary driver genes include oncogenes and tumor suppressors. In anembodiment, a driver gene has one or more driver gene mutations, e.g.,as described herein. In an embodiment, a driver gene is a gene listed inTables 60 or 61. In an embodiment, a driver gene is a gene thatmodulates one or more cellular processes described in Tables 60 or 61,e.g., cell fate determination, cell survival and genome maintenance. Inan embodiment, a driver gene is a gene that modulates one or morepathways described in Tables 60 or 61. In an embodiment, a driver geneis a gene that modulates one or more signaling pathways described inTable 62.

In an embodiment, a driver gene includes more than one driver mutation,and the first driver gene mutation, provides a selective growthadvantage to the cell in which it occurs. In an embodiment, thesubsequent mutation, e.g., second, third, fourth, fifth or latermutation, e.g., driver mutation in the driver gene, provides aproliferative capacity to the cell in which it occurs, e.g., allows forcell expansion, e.g., clonal expansion. In an embodiment, a driver genehas one or more passenger gene mutations, e.g., a somatic mutation thatarises in the development of a cancer but which is not a drivermutation. In an embodiment, a driver gene can be present, e.g.,expressed, in any cell type, e.g., a cell type derived from any one ofthe three germ cell layers: ectoderm, endoderm or mesoderm. In anembodiment, a driver gene is present, e.g., expressed, in a somaticcell. In an embodiment, a driver gene is present, e.g., expressed, in agerm cell. In an embodiment, a driver gene can be present in a largenumber of cancers, e.g., in more than 5% of cancers. In an embodiment, adriver gene can be present in a small number of cancer, e.g., in lessthan 5% of cancers. In an embodiment, a driver gene has a mutationpattern that is non-random and/or recurrent, i.e., the location at whicha driver mutation occurs in the driver gene is the same in differentcancer types. Exemplary recurrent driver gene mutations includemutations in the IDH1 gene at the substrate binding site, e.g., at codon132, and mutations in the PIK3CA gene in the helical domain or thekinase domain, as depicted in Vogelstein et al (2013) Science 339:1546-1558.

In an embodiment, a driver gene having a driver gene mutation is anoncogene. In an embodiment, an oncogene is a gene with an oncogene scoreof at least 20%, e.g., at least 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%,65%, 70%, 75%, 80%, 85%, 90%, 95%, 99% or 100%. In an embodiment, anoncogene score is defined as the number of mutations, e.g., clusteredmutations (e.g., missense mutations at the same amino acid, or identicalin-frame insertions or deletions) divided by the total number ofmutations. In an embodiment, a driver gene having an amplification,e.g., as described herein, is an oncogene. In an embodiment, a drivergene having a driver gene mutation is a tumor suppressor gene (TSG). Inan embodiment, a tumor suppressor gene is a gene with a tumor suppressorgene score of at least 20%, e.g., at least 25%, 30%, 35%, 40%, 45%, 50%,55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 99% or 100%. In anembodiment, a tumor suppressor gene score is defined as the number ofinactivating mutations divided by the total number of mutations. In anembodiment, a driver gene having a deletion, e.g., as described herein,is a tumor suppressor gene.

TABLE 60 Driver genes having mutations Tumor Tumor Gene Samples withOncogene Suppressor Oncogene or tumor Symbol Gene Name mutations scoreGene score suppressor TSG Pathways Cellular Process ABL1 c-abl oncogene1, 851 93%  0% Oncogene Cell Cell Survival receptor tyrosineCycle/Apoptosis kinase ACVR1B activin A receptor, 17  0% 42% TSG TGF-□□Cell Survival type IB AKT1 v-akt murine 155 93%  1% Oncogene PI3K CellSurvival thymoma viral oncogene homolog 1 ALK anaplastic 189 72%  1%Oncogene PI3K; RAS Cell Survival lymphoma receptor tyrosine kinase APCadenomatous 2561  2% 92% TSG APC Cell Fate polyposis coli AR androgenreceptor 23 54%  0% Oncogene Transcriptional Cell Fate Regulation ARID1AAT rich interactive 234  1% 83% TSG Chromatin Cell Fate domain 1A (SWI-Modification like) ARID1B AT rich interactive 17  0% 50% TSG ChromatinCell Fate domain 1B (SWI1- Modification like) ARID2 AT rich interactive45  0% 56% TSG Chromatin Cell Fate domain 2 (ARID, ModificationRFX-like) ASXL1 additional sex combs 442  5% 87% TSG Chromatin Cell Fatelike 1 (Drosophila) Modification ATM similar to Serine- 242 24% 30% TSGDNA Damage Genome protein kinase ATM Control Maintenance (Ataxiatelangiectasia mutated) (A-T, mutated); ataxia telangiectasia mutatedATRX alpha 50  4% 47% TSG Chromatin Cell Fate thalassemia/mentalModification retardation syndrome X-linked (RAD54 homolog, S.cerevisiae) AXIN1 axin 1 117 20% 27% TSG APC Cell Fate B2M beta-2- 3018% 39% TSG PI3K; RAS; MAPK Cell Survival microglobulin BAP1 BRCA1associated 99  8% 70% TSG DNA Damage Genome protein-1 (ubiquitin ControlMaintenance carboxy-terminal hydrolase) BCL2 B-cell 45 27%  1% OncogeneCell Cell Survival CLL/lymphoma 2 Cycle/Apoptosis BCOR BCL6 co-repressor21  0% 70% TSG Transcriptional Cell Fate Regulation BRAF v-raf murine24288 100%   0% Oncogene RAS Cell Survival sarcoma viral oncogenehomolog B1 BRCA1 breast cancer 1, 62  0% 69% TSG DNA Damage Genome earlyonset Control Maintenance BRCA2 breast cancer 2, 67  0% 30% TSG DNADamage Genome early onset Control Maintenance CARD11 caspase recruitment74 30%  1% Oncogene Cell Cell Survival domain family, Cycle/Apoptosismember 11 CASP8 caspase 8, apoptosis- 21  0% 52% TSG Cell Cell Survivalrelated cysteine Cycle/Apoptosis peptidase CBL Cas-Br-M (murine) 168 57% 9% Oncogene PI3K; RAS Cell Survival ecotropic retroviral transformingsequence CDC73 cell division cycle 45  4% 78% TSG Cell Cell Survival 73,Paf1/RNA Cycle/Apoptosis polymerase II complex component, homolog (S.cerevisiae) CDH1 cadherin 1, type 1, 200 14% 52% TSG APC Cell FateE-cadherin (epithelial) CDKN2A cyclin-dependent 968 32% 49% TSG CellCell Survival kinase inhibitor 2A Cycle/Apoptosis (melanoma, p16,inhibits CDK4) CEBPA CCAAT/enhancer 448 30% 54% TSG PI3K; RAS; MAPK CellSurvival binding protein (C/EBP), alpha CIC capicua homolog 47 12% 31%TSG RAS Cell Survival (Drosophila) CREBBP CREB binding 151 24% 34% TSGChromatin Cell Fate protein Modification; Transcriptional RegulationCRLF2 cytokine receptor- 10 100%  0% Oncogene STAT Cell Survival likefactor 2 CSF1R colony stimulating 48 50% 15% Oncogene PI3K; RAS CellSurvival factor 1 receptor CTNNB1 catenin (cadherin- 3262 92%  1%Oncogene APC Cell Fate associated protein), beta 1, 88 kDa CYLDcylindromatosis 26  0% 85% TSG Cell Cell Survival (turban tumorCycle/Apoptosis syndrome) DAXX death-domain 28  7% 61% TSG ChromatinCell Fate associated protein Modification; Cell Cycle/Apoptosis DNMT1DNA (cytosine-5-)- 22 36%  5% Oncogene Chromatin Cell Fatemethyltransferase 1 Modification DNMT3A DNA (cytosine-5-)- 788 74% 12%Oncogene Chromatin Cell Fate methyltransferase 3 Modification alpha EGFRepidermal growth 10628 97%  0% Oncogene PI3K; RAS Cell Survival factorreceptor (erythroblastic leukemia viral (v- erb-b) oncogene homolog,avian) EP300 E1A binding protein 88 12% 32% TSG Chromatin CellSurvival/Fate p300 Modification; APC; TGF-□ ; NOTCH ERBB2 v-erb-b2 16467%  3% Oncogene PI3K; RAS Cell Survival erythroblastic leukemia viraloncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian)EZH2 enhancer of zeste 276 67% 12% Oncogene Chromatin Cell Fate homolog2 Modification (Drosophila) FAM123B family with 55  4% 66% TSG APC CellFate sequence similarity 123B FBXW7 F-box and WD 312 55% 18% TSG NOTCHCell Fate repeat domain containing 7 FGFR2 fibroblast growth 121 49%  6%Oncogene PI3K; RAS; STAT Cell Survival factor receptor 2 FGFR3fibroblast growth 2948 99%  0% Oncogene PI3K; RAS; STAT Cell Survivalfactor receptor 3 FLT3 fms-related tyrosine 11520 98%  0% Oncogene RAS;PI3K; STAT Cell Survival kinase 3 FOXL2 forkhead box L2 330 100%   0%Oncogene TGF-□□ Cell Fate FUBP1 far upstream element 9  0% 70% TSG CellCell Survival (FUSE) binding Cycle/Apoptosis protein 1 GATA1 GATAbinding 203  8% 84% TSG NOTCH, TGF-□ Cell Fate protein 1 (globintranscription factor 1) GATA2 GATA binding 45 53%  4% Oncogene NOTCH,TGF-□ Cell Fate protein 2 GATA3 GATA binding 33  9% 66% TSGTranscriptional Cell Fate protein 3 Regulation GNA11 guanine nucleotide110 92%  1% Oncogene PI3K; RAS; MAPK Cell Survival binding protein (Gprotein), alpha 11 (Gq class) GNAQ guanine nucleotide 245 95%  1%Oncogene PI3K; RAS; MAPK Cell Survival binding protein (G protein), qpolypeptide GNAS GNAS complex 422 93%  2% Oncogene APC; PI3K; TGF-□,Cell Survival/Cell locus RAS Fate H3F3A H3 histone, family 122 93%  0%Oncogene Chromatin Cell Fate 3B (H3.3B); H3 Modification histone, family3A pseudogene; H3 histone, family 3A; similar to H3 histone, family 3B;similar to histone H3.3B HIST1H3B histone cluster 1, 25 60%  0% OncogeneChromatin Cell Fate H3j; histone cluster Modification 1, H3i; histonecluster 1, H3h; histone cluster 1, H3g; histone cluster 1, H3f; histonecluster 1, H3e; histone cluster 1, H3d; histone cluster 1, H3c; histonecluster 1, H3b; histone cluster 1, H3a; histone cluster 1, H2ad; histonecluster 2, H3a; histone cluster 2, H3c; histone cluster 2, H3d HNF1AHNF1 homeobox A 126 29% 55% TSG APC Cell Fate HRAS v-Ha-ras Harvey rat812 96%  0% Oncogene RAS Cell Survival sarcoma viral oncogene homologIDH1 isocitrate 4509 100%   0% Oncogene Chromatin Cell Fatedehydrogenase 1 Modification (NADP+), soluble IDH2 isocitrate 1029 99% 0% Oncogene Chromatin Cell Fate dehydrogenase 2 Modification (NADP+),mitochondrial JAK1 Janus kinase 1 61 26% 18% Oncogene STAT Cell SurvivalJAK2 Janus kinase 2 32692 100%   0% Oncogene STAT Cell Survival JAK3Janus kinase 3 89 60%  6% Oncogene STAT Cell Survival KDM5C lysine(K)-specific 26  0% 62% TSG Chromatin Cell Fate demethylase 5CModification KDM6A lysine (K)-specific 66  0% 72% TSG Chromatin CellFate demethylase 6A Modification KIT similar to Mast/stem 4720 90%  0%Oncogene PI3K; RAS; STAT Cell Survival cell growth factor receptorprecursor (SCFR) (Proto- oncogene tyrosine- protein kinase Kit) (c-kit)(CD117 antigen); v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogenehomolog KLF4 Kruppel-like factor 4 61 80%  4% Oncogene TranscriptionalCell Fate Regulation; WNT KRAS v-Ki-ras2 Kirsten rat 23261 100%   0%Oncogene RAS Cell Survival sarcoma viral oncogene homolog MAP2K1mitogen-activated 13 67%  0% Oncogene RAS Cell Survival protein kinasekinase 1 MAP3K1 mitogen-activated 11  0% 63% TSG RAS; MAPK Cell Survivalprotein kinase kinase kinase 1 MED12 mediator complex 337 84%  0%Oncogene Cell Cell Survival subunit 12 Cycle/Apoptosis; TGF-□□ MEN1multiple endocrine 290  7% 68% TSG Chromatin Cell Fate neoplasia IModification MET met proto-oncogene 159 61%  4% Oncogene PI3K; RAS CellSurvival (hepatocyte growth factor receptor) MLH1 mutL homolog 1, 61 18%37% TSG DNA Damage Genome colon cancer, Control Maintenance nonpolyposistype 2 (E. coli) MLL2 myeloid/lymphoid or 165  1% 70% TSG Chromatin CellFate mixed-lineage Modification leukemia 2 MLL3 myeloid/lymphoid or 111 5% 44% TSG Chromatin Cell Fate mixed-lineage Modification leukemia 3MPL myeloproliferative 531 96%  0% Oncogene STAT Cell SUrvival leukemiavirus oncogene MSH2 mutS homolog 2, 37  0% 65% TSG DNA Damage Genomecolon cancer, Control Maintenance nonpolyposis type 1 (E. coli) MSH6mutS homolog 6 135  3% 68% TSG DNA Damage Genome (E. coli) ControlMaintenance MYD88 myeloid 134 92%  1% Oncogene Cell Cell Survivaldifferentiation Cycle/Apoptosis primary response gene (88) NCOR1 nuclearreceptor co- 35 11% 32% TSG Chromatin Cell Fate repressor 1 ModificationNF1 neurofibromin 1 362  2% 73% TSG RAS Cell Survival NF2 neurofibromin2 609  4% 89% TSG APC Cell Fate (merlin) NFE2L2 nuclear factor 102 74% 1% Oncogene Cell Cell Survival (erythroid-derived Cycle/Apoptosis2)-like 2 NOTCH1 Notch homolog 1, 661 44% 27% TSG NOTCH Cell Fatetranslocation- associated (Drosophila) NOTCH2 Notch homolog 2 51  0% 27%TSG NOTCH Cell Fate (Drosophila) NPM1 nucleophosmin 1 2471  2% 98% TSGCell Cell Survival (nucleolar Cycle/Apoptosis phosphoprotein B23,numatrin) pseudogene 21; hypothetical LOC100131044; similar tonucleophosmin 1; nucleophosmin (nucleolar phosphoprotein B23, numatrin)NRAS neuroblastoma RAS 2738 99%  0% Oncogene RAS Cell Survival viral(v-ras) oncogene homolog PAX5 paired box 5 49 42% 26% TSG Chromatin CellFate Modification PBRM1 polybromo 1 171  0% 83% TSG Chromatin Cell FateModification PDGFRA platelet-derived 653 84%  1% Oncogene PI3K; RAS CellSurvival growth factor receptor, alpha polypeptide PHF6 PHD finger 5718% 61% TSG Transcriptional Cell Fate protein 6 Regulation PIK3CAphosphoinositide-3- 4560 95%  1% Oncogene PI3K Cell Survival kinase,catalytic, alpha polypeptide PIK3R1 phosphoinositide-3- 88 14% 37% TSGPI3K Cell Survival kinase, regulatory subunit 1 (alpha) PPP2R1A proteinphosphatase 86 85%  2% Oncogene Cell Cell Survival 2 (formerly 2A),Cycle/Apoptosis regulatory subunit A, alpha isoform PRDM1 PR domain 46 0% 64% TSG Chromatin Cell Fate containing 1, with Modification ZNFdomain PTCH1 patched homolog 1 318  7% 60% TSG HH Cell Fate (Drosophila)PTEN phosphatase and 1719 30% 55% TSG PI3K Cell Survival tensin homolog;phosphatase and tensin homolog pseudogene 1 PTPN11 protein tyrosine 41090%  0% Oncogene RAS Cell Survival phosphatase, non- receptor type 11;similar to protein tyrosine phosphatase, non- receptor type 11 RB1retinoblastoma 1 208  4% 80% TSG Cell Cell Survival Cycle/Apoptosis RETret proto-oncogene 500 86%  1% Oncogene RAS; PI3K Cell Survival RNF43ring finger protein 27  7% 43% TSG APC Cell Fate 43 RUNX1 runt-related304 34% 41% TSG Transcriptional Cell Fate transcription factor 1Regulation SETD2 SET domain 47  3% 47% TSG Chromatin Cell Fatecontaining 2 Modification SETBP1 SET binding protein 1 95 25%  4%Oncogene Chromatin Cell Fate Modification; Replication SF3B1 splicingfactor 3b, 516 91%  0% Oncogene Transcriptional Cell Fate subunit 1, 155kDa Regulation SMAD2 SMAD family 16  0% 41% TSG TGF-□□ Cell Survivalmember 2 SMAD4 SMAD family 207 24% 39% TSG TGF-□□ Cell Survival member 4SMARCA4 SWI/SNF related, 68 22% 22% TSG Chromatin Cell Fate matrixassociated, Modification actin dependent regulator of chromatin,subfamily a, member 4 SMARCB1 SWI/SNF related, 247 16% 74% TSG ChromatinCell Fate matrix associated, Modification actin dependent regulator ofchromatin, subfamily b, member 1 SMO smoothened 34 51%  3% Oncogene HHCell Fate homolog (Drosophila) SOCS1 suppressor of 41 15% 46% TSG STATCell Survival cytokine signaling 1 50X9 SRY (sex 9  0% 70% TSG APC CellSurvival determining region Y)-box 9 SPOP speckle-type POZ 35 66%  3%Oncogene Chromatin Cell Fate protein Modification; HH SRSF2 SRSF2 27395%  2% Oncogene Transcriptional Cell Fate serine/arginine-richRegulation splicing factor 2 STAG2 stromal antigen 2 21  0% 33% TSG DNAGenome Damage Maintenance Control STK11 serine/threonine 220 24% 52% TSGmTOR Cell Survival kinase 11 TET2 tet oncogene family 864 14% 70% TSGChromatin Cell Fate member 2 Modification TNFAIP3 tumor necrosis 136  1%80% TSG Cell Cell Survival factor, alpha- Cycle/Apoptosis; inducedprotein 3 MAPK TRAF7 TNF receptor- 123 61%  9% TSG Apoptosis CellSurvival associated factor 7 TP53 tumor protein p53 14438 73% 20% TSGCell Cell Survival Cycle/Apoptosis; DNA Damage Control TSC1 tuberoussclerosis 1 20  0% 45% TSG PI3K Cell SUrvival TSHR thyroid stimulating301 86%  0% Oncogene PI3K; MAPK Cell Survival hormone receptor U2AF1 U2small nuclear 96 92%  1% Oncogene Transcriptional Cell Fate RNAauxiliary Regulation factor 1 VHL von Hippel-Lindau 1287 27% 60% TSGPI3K; RAS; STAT Cell Survival tumor suppressor WT1 Wilms tumor 1 312 10%79% TSG Chromatin Cell Fate Modification

TABLE 61 List of driver genes with amplifications or deletions Oncogeneor tumor Gene Amplification suppressor Cellular Symbol Gene Name ordeletion (TSG) Pathway psrocess CCND1 cyclin D1 Amplification OncogeneCell Cycle/Apoptosis Cell Survival CDKN2C cyclin-dependent kinaseinhibitor 2C (p18, Homozygous deletion TSG Cell Cycle/Apoptosis CellSurvival inhibits CDK4) IKZF1 IKAROS family zinc finger 1 (Ikaros)Homozygous deletion TSG Transcriptional Cell Fate Regulation LMO1 LIMdomain only 1 (rhombotin 1) Amplification Oncogene Transcriptional CellFate Regulation MAP2K4 mitogen-activated protein kinase kinase 4Homozygous deletion TSG MAPK Cell Survival MDM2 Mdm2 p53 binding proteinhomolog (mouse) Amplification Oncogene Cell Cycle/Apoptosis CellSurvival MDM4 Mdm4 p53 binding protein homolog (mouse) AmplificationOncogene Cell Cycle/Apoptosis Cell Survival MYC v-myc myelocytomatosisviral oncogene Amplification Oncogene Cell Cycle/Apoptosis Cell Survivalhomolog (avian) MYCL1 v-myc myelocytomatosis viral oncogeneAmplification Oncogene Cell Cycle/Apoptosis Cell Survival homolog 1,lung carcinoma derived (avian) MYCN v-myc myelocytomatosis viral relatedoncogene, Amplification Oncogene Cell Cycle/Apoptosis Cell Survivalneuroblastoma derived (avian) NCOA3 nuclear receptor coactivator 3Amplification Oncogene Chromatin Cell Fate Modification NKX2-1 NK2homeobox 1 Amplification Oncogene PI3K; MAPK Cell Survival SKP2 S-phasekinase-associated protein 2 (p45) Amplification Oncogene CellCycle/Apoptosis Cell Survival

TABLE 62 Cancer cell signaling pathways Cellular process Signalingpathway Cell survival TGF-beta MAPK STAT PI3K RAS Cell cycle/apoptosisCell fate NOTCH Hedgehog (HH) APC Chromatin modification Transcriptionalregulation Genome DNA damage control maintenance

As used herein, the term “plurality” refers to two or more of theparticular parameter, element or characteristic that it is modifying.For example, the phase“a plurality of genetic cancers” encompasses “twoor more cancers.”

As used herein, the phrases “genetic biomarker” and “genetic marker”refer to a nucleic acid that is characteristic, alone in combinationwith other genetic or other biomarkers, of cancer in a subject. Agenetic biomarker can include a modification (e.g., a mutation) in agene. Examples of modifications include, without limitation, single basesubstitutions, insertions, deletions, indels, translocations, and copynumber variations. In some embodiments, a genetic biomarker includes amodification (e.g., an inactivating modification) in a tumor suppressorgene. In some embodiments, a genetic biomarker includes a modification(e.g., an activating modification) in an oncogene. Various geneticbiomarkers and genetic biomarker panels are described in more detailherein.

As used herein, the terms “mutation”, “genetic modification”, and“genetic alteration” are used interchangeably to indicate a change in awild type nucleic acid sequence. For example, in some embodiments,methods of detecting a mutation in cell-free DNA (e.g., ctDNA) aredescribed herein. It is to be understood that such methods can beinterchangeably described as detecting mutations, genetic modifications,or genetic alterations.

As used herein, the phrases “protein biomarker”, “protein marker”,“peptide biomarker”, and “peptide marker” refer to a protein that ischaracteristic, alone in combination with other protein or otherbiomarkers, of cancer in a subject. In some embodiments, a proteinbiomarker includes an elevated level of the protein in a subject (e.g.,a subject having cancer regardless of whether the subject is known tohave cancer) as compared to a reference subject that does not havecancer. In some embodiments, a protein biomarker includes a decreasedlevel of the protein in a subject (e.g., a subject having cancerregardless of whether the subject is known to have cancer) as comparedto a reference subject that does not have cancer. As used herein, thephrase “detecting a protein biomarker” can refer to detecting a level(e.g., an increased level or a decreased level) of the proteinbiomarker. Various protein biomarkers and protein biomarker panels aredescribed in more detail herein. In some embodiments, peptides that aredistinct from a protein biomarker are used in methods provided herein.

As used herein, the phrase “region of interest” refers to a subgenomicportion of genomic sequence (also referred to as a “subgenomicinterval”). A region of interest can be any appropriate size (e.g., caninclude any appropriate number of nucleotides). In some embodiments, aregion of interest or subgenomic interval can include a singlenucleotide (e.g., single nucleotide for which variants thereof areassociated (positively or negatively) with a tumor phenotype). In someembodiments, a region of interest or subgenomic interval can includemore than one nucleotide. For example, a region of interest orsubgenomic interval can include at least about 2 (e.g., about 5, about10, about 50, about 100, about 150, about 250, or about 300)nucleotides. In some cases, a region of interest or subgenomic intervalcan include an entire gene. In some cases, a region of interest orsubgenomic interval can include a portion of gene (e.g., a coding regionsuch as an exon, a non-coding region such as an intron, or a regulatoryregion such as a promoter, enhancer, 5′ untranslated region (5′ UTR), or3′ untranslated region (3′ UTR)). In some cases, a region of interest orsubgenomic interval can include all or part of a naturally occurring(e.g., genomic) nucleotide sequence. For example, a region of interestor subgenomic interval can correspond to a fragment of genomic DNA whichcan be subjected to a sequencing reaction. In some cases, a region ofinterest or subgenomic interval can be a continuous nucleotide sequencefrom a genomic source. In some cases, a region of interest or subgenomicinterval can include nucleotide sequences that are not contiguous withinthe genome. For example, a region of interest or subgenomic interval caninclude a nucleotide sequence that includes an exon-exon junction (e.g.,in cDNA reverse transcribed from the region of interest or subgenomicinterval). In some cases, a region of interest or subgenomic intervalcan include a mutation (e.g., a SNV, an SNP, a somatic mutation, a germline mutation, a point mutation, a rearrangement, a deletion mutation(e.g., an in-frame deletion, an intragenic deletion, or a full genedeletion), an insertion mutation (e.g., an intragenic insertion), aninversion mutation (e.g., an intra-chromosomal inversion), an invertedduplication mutation, a tandem duplication (e.g., an intrachromosomaltandem duplication), a translocation (e.g., a chromosomal translocation,or a non-reciprocal translocation), a change in gene copy number, or anycombination thereof.

The “driver number of a gene” refers to the number of DNA sequences in acell encoding a particular gene product. Generally, for a given gene, amammal has two copies of each gene. The copy number can be increased,e.g., by gene amplification or duplication, or reduced by deletion.

As used herein with reference to protein biomarkers, the phrase“elevated level” refers to a level of the protein biomarker that isgreater than a reference level of the protein biomarker typicallyobserved in a sample (e.g., a reference sample) from a healthy subject(e.g., a subject that does not exhibit a particular disease orcondition). In some embodiments, a reference sample can be a sampleobtained from a subject (e.g., a different or reference subject) thatdoes not have a cancer. For example, for a protein biomarker associatedwith colorectal cancer, a reference sample can be a sample obtained froma different or reference subject that does not have colorectal cancer.In some embodiments, a reference sample can be a sample obtained fromthe same subject in which the elevated level of a protein biomarker isobserved, where the reference sample was obtained prior to onset of thecancer. In some embodiments, such a reference sample obtained from thesame subject is frozen or otherwise preserved for future use as areference sample. In some embodiments, when reference samples haveundetectable levels of a protein biomarker, an elevated level can be anydetectable level of the protein biomarker. It will be appreciated thatlevels from comparable samples can be used when determining whether ornot a particular level is an elevated level.

As used herein with reference to protein biomarkers, the phrase“reference level” refers to the level of the protein biomarker that istypically present in a healthy subject (e.g., a subject that does notexhibit a particular disease or condition). A reference level of aprotein biomarker can be a level that is present in a reference subjectthat does not exhibit a disease or condition (e.g., cancer). Forexample, for a protein biomarker associated with colorectal cancer, areference sample can be a sample obtained from a subject that does nothave colorectal cancer. As another example, a reference level of aprotein biomarker can be a level that is present in a subject prior tothe onset of the disease or condition (e.g., cancer) in that subject. Insome embodiments, a disease or condition can be identified in a subjectwhen the measured or detected level of one or more protein biomarkers ishigher than reference level(s) of the one or more protein biomarkers.

As used herein, the term “sensitivity” refers to the ability of a methodto detect or identify the presence of a disease in a subject. Forexample, when used in reference to any of the variety of methodsdescribed herein that can detect the presence of cancer in a subject, ahigh sensitivity means that the method correctly identifies the presenceof cancer in the subject a large percentage of the time. For example, amethod described herein that correctly detects the presence of cancer ina subject 95% of the time the method is performed is said to have asensitivity of 95%. In some embodiments, a method described herein thatcan detect the presence of cancer in a subject provides a sensitivity ofat least 70% (e.g., about 70%, about 72%, about 75%, about 80%, about85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%,about 96%, about 97%, about 98%, about 99%, about 99.5%, or about 100%).In some embodiments, methods provided herein that include detecting thepresence of one or more members of two or more classes of biomarkers(e.g., genetic biomarkers and/or protein biomarkers) provide a highersensitivity than methods that include detecting the presence of one ormore members of only one class of biomarkers.

In some embodiments, sensitivity provides a measure of the ability of amethod to detect a sequence variant in a heterogeneous population ofsequences. A method has a sensitivity of S % for variants of F % if,given a sample in which the sequence variant is present as at least F %of the sequences in the sample, the method can detect the sequence at aconfidence of C %, S % of the time. By way of example, a method has asensitivity of 90% for variants of 5% if, given a sample in which thevariant sequence is present as at least 5% of the sequences in thesample, the method can detect the sequence at a confidence of 99%, 9 outof 10 times (F=5%; C=99%; S=90%). Exemplary sensitivities include thoseof S=90%, 95%, 99%, 99.9% for sequence variants at F=0.5%, 1%, 5%, 10%,20%, 50%, 100% at confidence levels of C=90%, 95%, 99%, and 99.9%.

As used herein, the term “specificity” refers to the ability of a methodto detect the presence of a disease in a subject (e.g., the specificityof a method can be described as the ability of the method to identifythe true positive over true negative rate in a subject and/or todistinguish a truly occurring sequence variant from a sequencingartifact or other closely related sequences). For example, when used inreference to any of the variety of methods described herein that candetect the presence of cancer in a subject, a high specificity meansthat the method correctly identifies the absence of cancer in thesubject a large percentage of the time (e.g., the method does notincorrectly identify the presence of cancer in the subject a largepercentage of the time). A method has a specificity of X % if, whenapplied to a sample set of N_(Total) sequences, in which X_(True)sequences are truly variant and X_(Not true) are not truly variant, themethod can select at least X % of the not truly variant as not variant.For example, a method has a specificity of 90% if, when applied to asample set of 1,000 sequences, in which 500 sequences are truly variantand 500 are not truly variant, the method selects 90% of the 500 nottruly variant sequences as not variant. For example, a method describedherein that correctly detects the absence of cancer in a subject 95% ofthe time the method is performed is said to have a specificity of 95%.In some embodiments, a method described herein that can detect theabsence of cancer in a subject provides a specificity of at least 80%(e.g., at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%,99%, 99.5%, or higher). A method having high specificity results inminimal or no false positive results (e.g., as compared to othermethods). False positive results can arise from any source. For example,in various methods described herein that correctly detect the absence ofcancer and include sequencing a nucleic acid, false positives can resultfrom errors introduced into the sequence of interest during samplepreparation, sequencing errors, and/or inadvertent sequencing of closelyrelated sequences such as pseudo-genes or members of a gene family. Insome embodiments, methods provided herein that include detecting thepresence of one or more members of two or more classes of biomarkers(e.g., genetic biomarkers and/or protein biomarkers) provide a higherspecificity than methods that include detecting the presence of one ormore members of only one class of biomarkers.

As used herein, the term “subject” is used interchangeably with the term“patient” and means a vertebrate, including any member of the classmammalia, including humans, domestic and farm animals, and zoo, sportsor pet animals, such as mouse, rabbit, pig, sheep, goat, cattle, horse(e.g., race horse), and higher primates. In some embodiments, thesubject is a human. In some embodiments, the subject has a disease. Insome embodiments, the subject has cancer. In some embodiments, thesubject has not been determined to have a cancer. In some embodiments,the subject has not exhibited a symptom associated with a cancer. Insome embodiments, the subject is a human harboring a cancer cell. Insome embodiments, the subject is a human harboring a cancer cell, but isnot known to harbor the cancer cell. In some embodiments, the subjecthas a viral disease. In some embodiments, the subject has a bacterialdisease. In some embodiments, the subject has a fungal disease. In someembodiments, the subject has a parasitic disease. In some embodiments,the subject has asthma. In some embodiments, the subject has anautoimmune disease. In some embodiments, the subject has graft vs. hostdisease. In some embodiments, a subject can be a pediatric subject. Forexample, the subject can be a pediatric human that is under the age of18 years (e.g., from about 6 months to about 18 years of age, such asabout 1, about 3, about 5, about 8, about 10, about 12, about 15, orabout 17 years of age). In some embodiments, a subject can be an adultsubject. For example, the subject can be a adult human that is 18 yearsof age or older (e.g., about 18, about 20, about 25, about 30, about 35,about 40, about 45, about 50, about 55, about 60, about 65, about 70,about 75, about 80 years of age, or older).

As used herein, the term “treatment” is used interchangeably with thephrase “therapeutic intervention.”

Methods of testing DNA isolated or obtained from white blood cells(e.g., white blood cell clones arising during age-associated clonalhematopoiesis (e.g., clonal hematopoietic expansion, also known asclonal hematopoiesis of indeterminate potential or CHIP) ormyelodysplasia) for the presence or absence of a genetic mutation thatis associated with cancer in order to determine whether that geneticalteration originates from a cancer cell in the subject are genericallydescribed herein as “verifying a genetic alteration against white bloodcells”, “verifying a genetic alteration against DNA from white bloodcells”, “white blood cell verification”, and similar phrases.

Overview

In general, methods and materials for detecting or identifying thepresence of cancer in a subject with high sensitivity and specificity ascompared to conventional methods of identifying the presence of cancerin a subject are provided herein. In some embodiments, methods providedherein for identifying the presence of cancer in a subject with highsensitivity and specificity are performed on a liquid sample(s) obtainedfrom the subject (e.g., blood, plasma, or serum), whereas conventionalmethods of identifying the presence of cancer in a subject do notachieve the level of sensitivity, the level of specificity, or both whenperformed on a liquid sample obtained from the subject. In someembodiments, methods provided herein for identifying the presence ofcancer in a subject with high sensitivity and specificity are performedprior to having determined that the subject already suffers from cancer,prior to having determined that the subject harbors a cancer cell,and/or prior to the subject exhibiting symptoms associated with cancer.Thus, in some embodiments, methods provided herein for identifying thepresence of cancer in a subject with high sensitivity and specificityare used as a first-line detection method, and not simply as aconfirmation (e.g., an “overcall”) of another detection method that thesubject has cancer.

In some embodiments, methods and materials provided herein provide highsensitivity in the detection or diagnosis of cancer (e.g., a highfrequency or incidence of correctly identifying a subject as havingcancer). In some embodiments, methods and materials provided hereinprovide a sensitivity of at least about 10%, at least about 15%, atleast about 20%, at least about 25%, at least about 30%, at least about35%, at least about 40%, at least about 45%, at least about 50%, atleast about 55%, at least about 60%, at least about 65%, at least about70%, at least about 75%, at least about 80%, at least about 85%, atleast about 90%, at least about 91%, at least about 92%, at least about93%, at least about 94%, at least about 95%, at least about 96%, atleast about 97%, at least about 98%, at least about 99%, or higher. Insome embodiments, methods and materials provided herein provide highsensitivity in detecting a single type of cancer. In some embodiments,methods and materials provided herein provide high sensitivity indetecting two or more types of cancers. Any of a variety of cancer typescan be detected using methods and materials provided herein (see, e.g.,the section entitled “Cancers”). In some embodiments, cancers that canbe detected using methods and materials provided herein includepancreatic cancer. In some embodiments, cancers that can be detectedusing methods and materials provided herein include liver cancer,ovarian cancer, esophageal cancer, stomach cancer, pancreatic cancer,colorectal cancer, lung cancer, or breast cancer. In some embodiments,cancers that can be detected using methods and materials provided hereininclude cancers of the female reproductive tract (e.g., cervical cancer,endometrial cancer, ovarian cancer, or fallopian tubal cancer). In someembodiments, cancers that can be detected using methods and materialsprovided herein include bladder cancer or upper-tract urothelialcarcinomas.

In some embodiments, methods and materials provided herein provide highspecificity in the detection or diagnosis of cancer (e.g., a lowfrequency or incidence of incorrectly identifying a subject as havingcancer when that subject does not have cancer). In some embodiments,methods and materials provided herein provide a specificity of at leastabout 10%, at least about 15%, at least about 20%, at least about 25%,at least about 30%, at least about 35%, at least about 40%, at leastabout 45%, at least about 50%, at least about 55%, at least about 60%,at least about 65%, at least about 70%, at least about 75%, at leastabout 80%, at least about 85%, at least about 90%, at least about 91%,at least about 92%, at least about 93%, at least about 94%, at leastabout 95%, at least about 96%, at least about 97%, at least about 98%,at least about 99%, or higher. As will be understood by those ofordinary skill in the art, a specificity of 99% means that only 1% ofsubjects that do not have cancer are incorrectly identified as havingcancer. In some embodiments, methods and materials provided hereinprovide high specificity in detecting a single cancer (e.g., there is alow probability of incorrectly identifying that subject as having thatsingle cancer type). In some embodiments, methods and materials providedherein provide high specificity in detecting two or more cancers (e.g.,there is a low probability of incorrectly identifying that subject ashaving those two or more cancer types).

As will be appreciated by those of ordinary skill in the art, anappropriate sensitivity or specificity in the detection or diagnosis ofcancer can be chosen based on a variety of factor. As one non-limitingexample, a method designed to provide a lower specificity in thedetection or diagnosis of cancer can be designed to have an increasedsensitivity. As another non-limiting example, a method designed toprovide an increased specificity in the detection or diagnosis of cancercan be designed to have a lower sensitivity. In some embodiments, even alow sensitivity can be advantageous (e.g., in screening a populationthat is not normally screened). In some embodiments, in populationswhere cancer (e.g., a particular type of cancer) is prevalent, a methodto detect or diagnose the presence of cancer can be designed to have arelatively high sensitivity, even at the cost of decreased specificity.In some embodiments, the sensitivity and specificity of variousdetection methods provided herein is determined based on the prevalenceof the disease in a specific patient population. In example, screeningtests for a general patient population not known to have cancer can bechosen to have high specificity (so as to eliminate false positivediagnoses and unnecessary further diagnostic testing and/or monitoring).As another example, screening tests for high risk populations (e.g.,populations in which the risk of having or developing cancer is higherthan the general population overall, e.g., due to the populationengaging or having engaged in in risky behaviors, having risky familyhistories, experiencing or having experienced risky environments, andthe like) cancer can be chosen to have high sensitivity (in order toincrease the provide greater certainty of detecting a cancer that ispresent, even at the expense of additional further diagnostic testingand/or monitoring that may not be appropriate for the generalpopulation). As one non-limiting example, a test with 90% sensitivityand 95% specificity will have positive predictive value (PPV) of 15% anda negative predictive value (NPV) of >99% in a population withprevalence of 0.01%, while both predictive values can be greater than99% if the prevalence was 40% (high risk population). PPV can becalculated as follows: # true positives (TP)/(#true positives+#falsepositives). PPV can also be calculated as follows:(sensitivity×prevalence)/(sensitivity×prevalence)+((1−specificity)×(1−prevalnce)). NPV can be calculated as follows: # true negatives/#ofnegative calls. PPV can also be calculated as follows:specificity×(1−prevalence)/((1−sensitivity)×prevalence)+(specificity×(1-prevalence). See, e.g., Lalkhen andMcCluskey, Clinical tests: sensitivity and specificity, ContinuingEducation in Anaesthesia, Critical Care & Pain, Volume 8, 2008,incorporated herein by reference in its entirety.

Methods of Detecting

Provided herein are methods and materials for detecting the presence ofone or more members (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20, or more members) of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained from asubject. In some embodiments, the presence of one or more members of oneor more classes of biomarkers and/or the presence of aneuploidy aretested simultaneously (e.g., in one testing procedure, includingembodiments in which the testing procedure itself may include multiplediscrete test methods of systems). In some embodiments, the presence ofone or more members of one or more classes of biomarkers and/or thepresence of aneuploidy are tested sequentially (e.g., in two or moredifferent testing procedures conducted at two or more different timepoints, including embodiments in which the testing procedure itself mayinclude multiple discrete test methods of systems). In some embodimentsof both simultaneous and sequential testing for the presence of one ormore members of one or more classes of biomarkers and/or the presence ofaneuploidy, the testing may be performed on a single sample or may beperformed on two or more different samples (e.g., two or more differentsamples obtained from the same subject).

Any of the variety of detection methods described herein (see, e.g.,sections entitled “Detection of Genetic Biomarkers”, “Detection ofProtein Biomarkers”, and “Detection of Aneuploidy”) can be used todetect the presence of one or more members of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained from asubject. In some embodiments, the one or more members of the one or moreclasses of biomarkers and/or the one or more classes of biomarkers areassociated with a disease in a subject. In some embodiments, aneuploidyis associated with a disease in a subject. In some embodiments, thedisease is cancer (e.g., any of the variety of types of cancer describedherein). In some embodiments, the one or more members are members of aclass of genetic biomarkers. In some embodiments, the one or moremembers are members of a class of protein biomarkers. In someembodiments, methods that include detecting the presence of one or moremembers of one or more classes of biomarkers in a sample obtained fromthe subject further include detecting the presence of aneuploidy in asample obtained from the subject. For example, methods that includedetecting the presence of one or more members of a class of geneticbiomarkers in a sample obtained from a subject can further includedetecting the presence of aneuploidy in a sample obtained from thesubject (e.g., the same sample or two different samples from thesubject). As another example, methods that include detecting thepresence of one or more members of a class of protein biomarkers in asample obtained from a subject can further include detecting thepresence of aneuploidy in a sample obtained from the subject (e.g., thesame sample or two different samples from the subject). In someembodiments, methods that include detecting both the presence of one ormore members of a class of genetic biomarkers and detecting the presenceof one or more members of a class of protein biomarkers in a sampleobtained from a subject can further include detecting the presence ofaneuploidy in a sample obtained from the subject (e.g., the same sampleor two or more different samples from the subject).

In some embodiments, methods provided herein include detecting thepresence of one or more members of a single class of biomarkers in oneor more samples obtained from a subject (e.g., genetic biomarkers orprotein biomarkers). In some embodiments, methods provided hereininclude detecting the presence of aneuploidy in one or more samplesobtained from a subject (e.g., genetic biomarkers or proteinbiomarkers). In some embodiments, methods provided herein includedetecting the presence of one or more members of a single class ofbiomarkers in one or more samples obtained from a subject (e.g., geneticbiomarkers or protein biomarkers) and detecting the presence ofaneuploidy in one or more samples obtained from the subject. In someembodiments, methods provided herein include detecting the presence ofone or more members of two or more classes of biomarkers in one or moresamples obtained from a subject (e.g., genetic biomarkers and proteinbiomarkers). In some embodiments, methods provided herein includedetecting the presence of one or more members of two or more classes ofbiomarkers in one or more samples obtained from a subject (e.g., geneticbiomarkers and protein biomarkers) and detecting the presence ofaneuploidy in one or more samples obtained from the subject.

In some embodiments, a single sample obtained from a subject can betested to detect the presence of one or more members of one or moreclasses of biomarkers and/or for the presence of aneuploidy.Alternatively, two or more samples can be obtained from a subject, andeach of the two or more samples can be individually tested to detect thepresence of one or more members of one or more classes of biomarkersand/or for the presence of aneuploidy. As one non-limiting example, afirst sample obtained from a subject can be tested to detect thepresence of one or more members of a first class of biomarkers (e.g.,genetic biomarkers), and a second sample obtained from the subject canbe tested to detect the presence of one or more members of a secondclass of biomarkers (e.g., protein biomarkers). As another non-limitingexample, a first sample obtained from a subject can be tested to detectthe presence of one or more members of a class of biomarkers (e.g.,genetic biomarkers or protein biomarkers), and a second sample obtainedfrom the subject can be tested to detect the presence of aneuploidy. Asanother non-limiting example, a first sample obtained from a subject canbe tested to detect the presence of one or more members of a first classof biomarkers (e.g., genetic biomarkers) and to detect the presence ofone or more members of a second class of biomarkers (e.g., proteinbiomarkers), while a second sample obtained from the subject can betested to detect the presence of aneuploidy. As another non-limitingexample, a first sample obtained from a subject can be tested to detectthe presence of one or more members of a first class of biomarkers(e.g., genetic or protein biomarkers) and to detect the presence ofaneuploidy, while a second sample obtained from the subject can betested to detect the presence of one or more members of a second classof biomarkers (e.g., a class of biomarkers that is different from thefirst class that is tested for in the first sample).

In some embodiments, the presence of one or more members of one or moreclasses of biomarkers and/or the presence of aneuploidy (e.g., detectedby any of the variety of methods disclosed herein) in a sample obtainedfrom a subject is associated with a disease and indicates the subjectsuffers from that disease. In some embodiments, a subject is diagnosedwith a disease when the presence of one or more members of one or moreclasses of biomarkers and/or the presence of aneuploidy (whichbiomarkers and/or aneuploidy are associated with a disease) in a sampleobtained from a subject is detected. In some embodiments, the disease iscancer (e.g., any of the variety of cancers described herein). In someembodiments, a subject is not known to have a disease (e.g., cancer)prior to detecting the presence of one or more members of one or moreclasses of biomarkers and/or the presence of aneuploidy. In someembodiments, a subject is not known to harbor a cancer cell prior todetecting the presence of one or more members of one or more classes ofbiomarkers and/or the presence of aneuploidy. In some embodiments, asubject does not exhibit symptoms associated with cancer prior todetecting the presence of one or more members of one or more classes ofbiomarkers and/or the presence of aneuploidy.

Methods of Diagnosis

Also provided herein are methods and materials for diagnosing oridentifying the presence of a disease in a subject (e.g., identifyingthe subject as having cancer) by detecting of one or more members (e.g.,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,or more members) of one or more classes of biomarkers and/or thepresence of aneuploidy in a sample obtained from the subject. In someembodiments of diagnosing or identifying the presence of a disease in asubject (e.g., identifying the subject as having cancer), the presenceof one or more members of one or more classes of biomarkers and/or thepresence of aneuploidy are tested simultaneously (e.g., in one testingprocedure, including embodiments in which the testing procedure itselfmay include multiple discrete test methods of systems). In someembodiments of diagnosing or identifying the presence of a disease in asubject (e.g., identifying the subject as having cancer), the presenceof one or more members of one or more classes of biomarkers and/or thepresence of aneuploidy are tested sequentially (e.g., in two or moredifferent testing procedures conducted at two or more different timepoints, including embodiments in which the testing procedure itself mayinclude multiple discrete test methods of systems). In some embodimentsof diagnosing or identifying the presence of a disease in a subject(e.g., identifying the subject as having cancer) that include eithersimultaneous or sequential testing (or both) for the presence of one ormore members of one or more classes of biomarkers and/or the presence ofaneuploidy, the testing may be performed on a single sample or may beperformed on two or more different samples (e.g., two or more differentsamples obtained from the same subject).

Any of the variety of detection methods described herein (see, e.g.,sections entitled “Detection of Genetic Biomarkers”, “Detection ofProtein Biomarkers”, and “Detection of Aneuploidy”) can be used todetect the presence of one or more members of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained from asubject. In some embodiments, the one or more members of the one or moreclasses of biomarkers and/or the one or more classes of biomarkers areassociated with a disease in a subject. In some embodiments, aneuploidyis associated with a disease in a subject. In some embodiments, thedisease is cancer (e.g., any of the variety of types of cancer describedherein). In some embodiments, the one or more members are members of aclass of genetic biomarkers. In some embodiments, the one or moremembers are members of a class of protein biomarkers. In someembodiments, methods that include diagnosing the presence cancer in asubject (e.g., identifying the subject as having cancer) by detectingthe presence of one or more members of one or more classes of biomarkersin a sample obtained from the subject further include detecting thepresence of aneuploidy in a sample obtained from the subject. Forexample, methods that include diagnosing the presence cancer in asubject (e.g., identifying the subject as having cancer) by detectingthe presence of one or more members of a class of genetic biomarkers ina sample obtained from the subject can further include detecting thepresence of aneuploidy in a sample obtained from the subject (e.g., thesame sample or two different samples from the subject). As anotherexample, methods that include diagnosing the presence cancer in asubject (e.g., identifying the subject as having cancer) by detectingthe presence of one or more members of a class of protein biomarkers ina sample obtained from the subject can further include detecting thepresence of aneuploidy in a sample obtained from the subject (e.g., thesame sample or two different samples from the subject). In someembodiments, methods that include diagnosing the presence cancer in asubject (e.g., identifying the subject as having cancer) by detectingboth the presence of one or more members of a class of geneticbiomarkers and detecting the presence of one or more members of a classof protein biomarkers in a sample obtained from the subject can furtherinclude detecting the presence of aneuploidy in a sample obtained fromthe subject (e.g., the same sample or two or more different samples fromthe subject).

In some embodiments, methods provided herein include diagnosing thepresence cancer in a subject (e.g., identifying the subject as havingcancer) by detecting the presence of one or more members of a singleclass of biomarkers in one or more samples obtained from the subject(e.g., genetic biomarkers or protein biomarkers). In some embodiments,methods provided herein include diagnosing the presence cancer in asubject (e.g., identifying the subject as having cancer) by detectingthe presence of aneuploidy in one or more samples obtained from thesubject (e.g., genetic biomarkers or protein biomarkers). In someembodiments, methods provided herein include diagnosing the presencecancer in a subject (e.g., identifying the subject as having cancer) bydetecting the presence of one or more members of a single class ofbiomarkers in one or more samples obtained from the subject (e.g.,genetic biomarkers or protein biomarkers) and detecting the presence ofaneuploidy in one or more samples obtained from the subject. In someembodiments, methods provided herein include diagnosing the presencecancer in a subject (e.g., identifying the subject as having cancer) bydetecting the presence of one or more members of two or more classes ofbiomarkers in one or more samples obtained from the subject (e.g.,genetic biomarkers and protein biomarkers). In some embodiments, methodsprovided herein include diagnosing the presence cancer in a subject(e.g., identifying the subject as having cancer) by detecting thepresence of one or more members of two or more classes of biomarkers inone or more samples obtained from the subject (e.g., genetic biomarkersand protein biomarkers) and detecting the presence of aneuploidy in oneor more samples obtained from the subject.

In some embodiments, a single sample obtained from a subject can betested to detect the presence of one or more members of one or moreclasses of biomarkers and/or for the presence of aneuploidy, and thesubject can be diagnosed as having cancer (e.g., is identified as havingcancer) when the presence of the one or more members of the one or moreclasses of biomarkers and/or the presence of aneuploidy is detected.Alternatively, two or more samples can be obtained from a subject, andeach of the two or more samples can be individually tested to detect thepresence of one or more members of one or more classes of biomarkersand/or for the presence of aneuploidy, and the subject can be diagnosedas having cancer (e.g., is identified as having cancer) when thepresence of the one or more members of the one or more classes ofbiomarkers and/or the presence of aneuploidy is detected. As onenon-limiting example, a first sample obtained from a subject can betested to detect the presence of one or more members of a first class ofbiomarkers (e.g., genetic biomarkers), and a second sample obtained fromthe subject can be tested to detect the presence of one or more membersof a second class of biomarkers (e.g., protein biomarkers), wherein thesubject is diagnosed as having cancer (e.g., is identified as havingcancer) when the presence of the one or more members of the first classof biomarkers is detected and/or the presence of the one or more membersof the second class of biomarkers is detected (e.g., when the presenceof the one or more members of the class of biomarkers is detected andthe presence of the one or more members of the second class ofbiomarkers are detected). As another non-limiting example, a firstsample obtained from a subject can be tested to detect the presence ofone or more members of a class of biomarkers (e.g., genetic biomarkersor protein biomarkers), and a second sample obtained from the subjectcan be tested to detect the presence of aneuploidy, wherein the subjectis diagnosed as having cancer (e.g., is identified as having cancer)when the presence of the one or more members of the class of biomarkersis detected and/or the presence aneuploidy is detected (e.g., when thepresence of the one or more members of the class of biomarkers isdetected and the presence aneuploidy are detected). As anothernon-limiting example, a first sample obtained from a subject can betested to detect the presence of one or more members of a first class ofbiomarkers (e.g., genetic biomarkers) and to detect the presence of oneor more members of a second class of biomarkers (e.g., proteinbiomarkers), while a second sample obtained from the subject can betested to detect the presence of aneuploidy, wherein the subject isdiagnosed as having cancer (e.g., is identified as having cancer) whenthe presence of the one or more members of the first class of biomarkersis detected, the presence of the one or more members of the second classof biomarkers is detected, and/or the presence of aneuploidy is detected(e.g., when the presence of the one or more members of the first classof biomarkers is detected, the presence of the one or more members ofthe second class of biomarkers is detected, and the presence ofaneuploidy are detected). As another non-limiting example, a firstsample obtained from a subject can be tested to detect the presence ofone or more members of a first class of biomarkers (e.g., genetic orprotein biomarkers) and to detect the presence of aneuploidy, while asecond sample obtained from the subject can be tested to detect thepresence of one or more members of a second class of biomarkers (e.g., aclass of biomarkers that is different from the first class that istested for in the first sample), wherein the subject is diagnosed ashaving cancer (e.g., is identified as having cancer) when the presenceof the one or more members of the first class of biomarkers is detected,the presence of the one or more members of the second class ofbiomarkers is detected, and/or the presence of aneuploidy is detected(e.g., when the presence of the one or more members of the first classof biomarkers is detected, the presence of the one or more members ofthe second class of biomarkers is detected, and the presence ofaneuploidy are detected).

In some embodiments of diagnosing or identifying the presence of adisease (e.g., cancer) in a subject (e.g., using any of the variety ofmethods described herein), the subject is also identified as a candidatefor further diagnostic testing. In some embodiments of diagnosing oridentifying the presence of a disease (e.g., cancer) in a subject (e.g.,using any of the variety of methods described herein), the subject isalso identified as a candidate for increased monitoring. In someembodiments of diagnosing or identifying the presence of a disease(e.g., cancer) in a subject (e.g., using any of the variety of methodsdescribed herein), the subject is also identified as a candidate thatwill or is likely to respond to a treatment (e.g., any of the variety oftherapeutic interventions described herein). In some embodiments ofdiagnosing or identifying the presence of a disease (e.g., cancer) in asubject (e.g., using any of the variety of methods described herein),the subject is also administered a treatment (e.g., any of the varietyof therapeutic interventions described herein).

Methods of Identifying a Subject as being at Risk of Having orDeveloping a Disease

Provided herein are methods and materials for identifying a subject asbeing at risk (e.g., increased risk) of having or developing a disease(e.g., cancer) by detecting the presence of one or more members (e.g.,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,or more members) of one or more classes of biomarkers and/or thepresence of aneuploidy in a sample obtained from the subject. In someembodiments of identifying a subject as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., cancer), the presence ofone or more members of one or more classes of biomarkers and/or thepresence of aneuploidy are tested simultaneously (e.g., in one testingprocedure, including embodiments in which the testing procedure itselfmay include multiple discrete test methods of systems). In someembodiments of identifying a subject as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., cancer), the presence ofone or more members of one or more classes of biomarkers and/or thepresence of aneuploidy are tested sequentially (e.g., in two or moredifferent testing procedures conducted at two or more different timepoints, including embodiments in which the testing procedure itself mayinclude multiple discrete test methods of systems). In some embodimentsof identifying a subject as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) that include eithersimultaneous or sequential testing (or both) for the presence of one ormore members of one or more classes of biomarkers and/or the presence ofaneuploidy, the testing may be performed on a single sample or may beperformed on two or more different samples (e.g., two or more differentsamples obtained from the same subject).

Any of the variety of detection methods described herein (see, e.g.,sections entitled “Detection of Genetic Biomarkers”, “Detection ofProtein Biomarkers”, and “Detection of Aneuploidy”) can be used todetect the presence of one or more members of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained from asubject. In some embodiments, the one or more members of the one or moreclasses of biomarkers and/or the one or more classes of biomarkers areassociated with a disease in a subject. In some embodiments, aneuploidyis associated with a disease in a subject. In some embodiments, thedisease is cancer (e.g., any of the variety of types of cancer describedherein). In some embodiments, the one or more members are members of aclass of genetic biomarkers. In some embodiments, the one or moremembers are members of a class of protein biomarkers. In someembodiments, methods that include identifying a subject as being at risk(e.g., increased risk) of having or developing a disease (e.g., cancer)by detecting the presence of one or more members of one or more classesof biomarkers in a sample obtained from the subject further includedetecting the presence of aneuploidy in a sample obtained from thesubject. For example, methods that include identifying a subject asbeing at risk (e.g., increased risk) of having or developing a disease(e.g., cancer) by detecting the presence of one or more members of aclass of genetic biomarkers in a sample obtained from the subject canfurther include detecting the presence of aneuploidy in a sampleobtained from the subject (e.g., the same sample or two differentsamples from the subject). As another example, methods that includeidentifying a subject as being at risk (e.g., increased risk) of havingor developing a disease (e.g., cancer) by detecting the presence of oneor more members of a class of protein biomarkers in a sample obtainedfrom the subject can further include detecting the presence ofaneuploidy in a sample obtained from the subject (e.g., the same sampleor two different samples from the subject). In some embodiments, methodsthat include identifying a subject as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., cancer) by detecting boththe presence of one or more members of a class of genetic biomarkers anddetecting the presence of one or more members of a class of proteinbiomarkers in a sample obtained from the subject can further includedetecting the presence of aneuploidy in a sample obtained from thesubject (e.g., the same sample or two or more different samples from thesubject).

In some embodiments, methods provided herein for identifying a subjectas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) include detecting the presence of one or moremembers of a single class of biomarkers in one or more samples obtainedfrom the subject (e.g., genetic biomarkers or protein biomarkers). Insome embodiments, methods provided herein for identifying a subject asbeing at risk (e.g., increased risk) of having or developing a disease(e.g., cancer) include detecting the presence of aneuploidy in one ormore samples obtained from the subject (e.g., genetic biomarkers orprotein biomarkers). In some embodiments, methods provided herein foridentifying a subject as being at risk (e.g., increased risk) of havingor developing a disease (e.g., cancer) include detecting the presence ofone or more members of a single class of biomarkers in one or moresamples obtained from the subject (e.g., genetic biomarkers or proteinbiomarkers) and detecting the presence of aneuploidy in one or moresamples obtained from the subject. In some embodiments, methods providedherein for identifying a subject as being at risk (e.g., increased risk)of having or developing a disease (e.g., cancer) include detecting thepresence of one or more members of two or more classes of biomarkers inone or more samples obtained from the subject (e.g., genetic biomarkersand protein biomarkers). In some embodiments, methods provided hereinfor identifying a subject as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) include detecting thepresence of one or more members of two or more classes of biomarkers inone or more samples obtained from the subject (e.g., genetic biomarkersand protein biomarkers) and detecting the presence of aneuploidy in oneor more samples obtained from the subject.

In some embodiments, a single sample obtained from a subject can betested to detect the presence of one or more members of one or moreclasses of biomarkers and/or for the presence of aneuploidy, and thesubject can be identified as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) when the presence of theone or more members of the one or more classes of biomarkers and/or thepresence of aneuploidy is detected. Alternatively, two or more samplescan be obtained from a subject, and each of the two or more samples canbe individually tested to detect the presence of one or more members ofone or more classes of biomarkers and/or for the presence of aneuploidy,and the subject can be identified as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., cancer) when the presenceof the one or more members of the one or more classes of biomarkersand/or the presence of aneuploidy is detected. As one non-limitingexample, a first sample obtained from a subject can be tested to detectthe presence of one or more members of a first class of biomarkers(e.g., genetic biomarkers), and a second sample obtained from thesubject can be tested to detect the presence of one or more members of asecond class of biomarkers (e.g., protein biomarkers), wherein thesubject is identified as being at risk (e.g., increased risk) of havingor developing a disease (e.g., cancer) when the presence of the one ormore members of the first class of biomarkers is detected and/or thepresence of the one or more members of the second class of biomarkers isdetected (e.g., when the presence of the one or more members of theclass of biomarkers is detected and the presence of the one or moremembers of the second class of biomarkers are detected). As anothernon-limiting example, a first sample obtained from a subject can betested to detect the presence of one or more members of a class ofbiomarkers (e.g., genetic biomarkers or protein biomarkers), and asecond sample obtained from the subject can be tested to detect thepresence of aneuploidy, wherein the subject is identified as being atrisk (e.g., increased risk) of having or developing a disease (e.g.,cancer) when the presence of the one or more members of the class ofbiomarkers is detected and/or the presence aneuploidy is detected (e.g.,when the presence of the one or more members of the class of biomarkersis detected and the presence aneuploidy are detected). As anothernon-limiting example, a first sample obtained from a subject can betested to detect the presence of one or more members of a first class ofbiomarkers (e.g., genetic biomarkers) and to detect the presence of oneor more members of a second class of biomarkers (e.g., proteinbiomarkers), while a second sample obtained from the subject can betested to detect the presence of aneuploidy, wherein the subject isidentified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) when the presence of the one or moremembers of the first class of biomarkers is detected, the presence ofthe one or more members of the second class of biomarkers is detected,and/or the presence of aneuploidy is detected (e.g., when the presenceof the one or more members of the first class of biomarkers is detected,the presence of the one or more members of the second class ofbiomarkers is detected, and the presence of aneuploidy are detected). Asanother non-limiting example, a first sample obtained from a subject canbe tested to detect the presence of one or more members of a first classof biomarkers (e.g., genetic or protein biomarkers) and to detect thepresence of aneuploidy, while a second sample obtained from the subjectcan be tested to detect the presence of one or more members of a secondclass of biomarkers (e.g., a class of biomarkers that is different fromthe first class that is tested for in the first sample), wherein thesubject is identified as being at risk (e.g., increased risk) of havingor developing a disease (e.g., cancer) when the presence of the one ormore members of the first class of biomarkers is detected, the presenceof the one or more members of the second class of biomarkers isdetected, and/or the presence of aneuploidy is detected (e.g., when thepresence of the one or more members of the first class of biomarkers isdetected, the presence of the one or more members of the second class ofbiomarkers is detected, and the presence of aneuploidy are detected).

In some embodiments of identifying a subject as being at risk (e.g.,increased risk) of having or developing a disease (e.g., using any ofthe variety of methods described herein), the subject is also identifiedas a candidate for further diagnostic testing. In some embodiments ofidentifying a subject as being at risk (e.g., increased risk) of havingor developing a disease (e.g., using any of the variety of methodsdescribed herein), the subject is also identified as a candidate forincreased monitoring. In some embodiments of identifying a subject asbeing at risk (e.g., increased risk) of having or developing a disease(e.g., using any of the variety of methods described herein), thesubject is also identified as a candidate that will or is likely torespond to a treatment (e.g., any of the variety of therapeuticinterventions described herein including, without limitation, achemopreventive). In some embodiments of identifying a subject as beingat risk (e.g., increased risk) of having or developing a disease (e.g.,using any of the variety of methods described herein), the subject isalso administered a treatment (e.g., any of the variety of therapeuticinterventions described herein including, without limitation, achemopreventive).

Methods of Treatment

Also provided herein are methods and materials for treating a subjectwho has been diagnosed or identified as having a disease (e.g., cancer)or who has been identified as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) by detecting of one ormore members (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, or more members) of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained fromthe subject. In some embodiments of treating a subject who has beendiagnosed or identified as having a disease (e.g., cancer) or who hasbeen identified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer), the presence of one or more membersof one or more classes of biomarkers and/or the presence of aneuploidyare tested simultaneously (e.g., in one testing procedure, includingembodiments in which the testing procedure itself may include multiplediscrete test methods of systems). In some embodiments of treating asubject who has been diagnosed or identified as having a disease (e.g.,cancer) or who has been identified as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., cancer), the presence ofone or more members of one or more classes of biomarkers and/or thepresence of aneuploidy are tested sequentially (e.g., in two or moredifferent testing procedures conducted at two or more different timepoints, including embodiments in which the testing procedure itself mayinclude multiple discrete test methods of systems). In some embodimentsof treating a subject who has been diagnosed or identified as having adisease (e.g., cancer) or who has been identified as being at risk(e.g., increased risk) of having or developing a disease (e.g., cancer)that include either simultaneous or sequential testing (or both) for thepresence of one or more members of one or more classes of biomarkersand/or the presence of aneuploidy, the testing may be performed on asingle sample or may be performed on two or more different samples(e.g., two or more different samples obtained from the same subject).

Any of the variety of detection methods described herein (see, e.g.,sections entitled “Detection of Genetic Biomarkers”, “Detection ofProtein Biomarkers”, and “Detection of Aneuploidy”) can be used todetect the presence of one or more members of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained from asubject. In some embodiments, the one or more members of the one or moreclasses of biomarkers and/or the one or more classes of biomarkers areassociated with a disease in a subject. In some embodiments, aneuploidyis associated with a disease in a subject. In some embodiments, thedisease is cancer (e.g., any of the variety of types of cancer describedherein). In some embodiments, the one or more members are members of aclass of genetic biomarkers. In some embodiments, the one or moremembers are members of a class of protein biomarkers. In someembodiments, methods that include treating a subject who has beendiagnosed or identified as having a disease (e.g., cancer) or who hasbeen identified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) by detecting the presence of one ormore members of one or more classes of biomarkers in a sample obtainedfrom the subject further include detecting the presence of aneuploidy ina sample obtained from the subject. For example, methods that includetreating a subject who has been diagnosed or identified as having adisease (e.g., cancer) or who has been identified as being at risk(e.g., increased risk) of having or developing a disease (e.g., cancer)by detecting the presence of one or more members of a class of geneticbiomarkers in a sample obtained from the subject can further includedetecting the presence of aneuploidy in a sample obtained from thesubject (e.g., the same sample or two different samples from thesubject). As another example, methods that include treating a subjectwho has been diagnosed or identified as having a disease (e.g., cancer)or who has been identified as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) by detecting the presenceof one or more members of a class of protein biomarkers in a sampleobtained from the subject can further include detecting the presence ofaneuploidy in a sample obtained from the subject (e.g., the same sampleor two different samples from the subject). In some embodiments, methodsthat include treating a subject who has been diagnosed or identified ashaving a disease (e.g., cancer) or who has been identified as being atrisk (e.g., increased risk) of having or developing a disease (e.g.,cancer) by detecting both the presence of one or more members of a classof genetic biomarkers and detecting the presence of one or more membersof a class of protein biomarkers in a sample obtained from the subjectcan further include detecting the presence of aneuploidy in a sampleobtained from the subject (e.g., the same sample or two or moredifferent samples from the subject).

In some embodiments, methods provided herein for treating a subject whohas been diagnosed or identified as having a disease (e.g., cancer) orwho has been identified as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) include detecting thepresence of one or more members of a single class of biomarkers in oneor more samples obtained from a subject (e.g., genetic biomarkers orprotein biomarkers). In some embodiments, methods provided herein fortreating a subject who has been diagnosed or identified as having adisease (e.g., cancer) or who has been identified as being at risk(e.g., increased risk) of having or developing a disease (e.g., cancer)include detecting the presence of aneuploidy in one or more samplesobtained from a subject (e.g., genetic biomarkers or proteinbiomarkers). In some embodiments, methods provided herein for treating asubject who has been diagnosed or identified as having a disease (e.g.,cancer) or who has been identified as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., cancer) include detectingthe presence of one or more members of a single class of biomarkers inone or more samples obtained from a subject (e.g., genetic biomarkers orprotein biomarkers) and detecting the presence of aneuploidy in one ormore samples obtained from the subject. In some embodiments, methodsprovided herein for treating a subject who has been diagnosed oridentified as having a disease (e.g., cancer) or who has been identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) include detecting the presence of one or moremembers of two or more classes of biomarkers in one or more samplesobtained from a subject (e.g., genetic biomarkers and proteinbiomarkers). In some embodiments, methods provided herein for treating asubject who has been diagnosed or identified as having a disease (e.g.,cancer) or who has been identified as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., cancer) include detectingthe presence of one or more members of two or more classes of biomarkersin one or more samples obtained from a subject (e.g., genetic biomarkersand protein biomarkers) and detecting the presence of aneuploidy in oneor more samples obtained from the subject.

In some embodiments, a single sample obtained from a subject can betested to detect the presence of one or more members of one or moreclasses of biomarkers and/or for the presence of aneuploidy, and thesubject can be diagnosed or identified as having a disease (e.g.,cancer) or as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) and/or the subject can be treatedwhen the presence of the one or more members of the one or more classesof biomarkers and/or the presence of aneuploidy is detected.Alternatively, two or more samples can be obtained from a subject, andeach of the two or more samples can be individually tested to detect thepresence of one or more members of one or more classes of biomarkersand/or for the presence of aneuploidy, and the subject can be diagnosedor identified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) and/or the subject can be treatedwhen the presence of the one or more members of the one or more classesof biomarkers and/or the presence of aneuploidy is detected. As onenon-limiting example, a first sample obtained from a subject can betested to detect the presence of one or more members of a first class ofbiomarkers (e.g., genetic biomarkers), and a second sample obtained fromthe subject can be tested to detect the presence of one or more membersof a second class of biomarkers (e.g., protein biomarkers), wherein thesubject is diagnosed or identified as having a disease (e.g., cancer) oras being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) and/or the subject can be treated when thepresence of the one or more members of the first class of biomarkers isdetected and/or the presence of the one or more members of the secondclass of biomarkers is detected (e.g., when the presence of the one ormore members of the class of biomarkers is detected and the presence ofthe one or more members of the second class of biomarkers are detected).As another non-limiting example, a first sample obtained from a subjectcan be tested to detect the presence of one or more members of a classof biomarkers (e.g., genetic biomarkers or protein biomarkers), and asecond sample obtained from the subject can be tested to detect thepresence of aneuploidy, wherein the subject is diagnosed or identifiedas having a disease (e.g., cancer) or as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., cancer) and/or thesubject is treated when the presence of the one or more members of theclass of biomarkers is detected and/or the presence aneuploidy isdetected (e.g., when the presence of the one or more members of theclass of biomarkers is detected and the presence aneuploidy aredetected). As another non-limiting example, a first sample obtained froma subject can be tested to detect the presence of one or more members ofa first class of biomarkers (e.g., genetic biomarkers) and to detect thepresence of one or more members of a second class of biomarkers (e.g.,protein biomarkers), while a second sample obtained from the subject canbe tested to detect the presence of aneuploidy, wherein the subject isdiagnosed or identified as having a disease (e.g., cancer) or as beingat risk (e.g., increased risk) of having or developing a disease (e.g.,cancer) and/or the subject is treated when the presence of the one ormore members of the first class of biomarkers is detected, the presenceof the one or more members of the second class of biomarkers isdetected, and/or the presence of aneuploidy is detected (e.g., when thepresence of the one or more members of the first class of biomarkers isdetected, the presence of the one or more members of the second class ofbiomarkers is detected, and the presence of aneuploidy are detected). Asanother non-limiting example, a first sample obtained from a subject canbe tested to detect the presence of one or more members of a first classof biomarkers (e.g., genetic or protein biomarkers) and to detect thepresence of aneuploidy, while a second sample obtained from the subjectcan be tested to detect the presence of one or more members of a secondclass of biomarkers (e.g., a class of biomarkers that is different fromthe first class that is tested for in the first sample), wherein thesubject is diagnosed or identified as having a disease (e.g., cancer) oras being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) and/or the subject is treated when the presenceof the one or more members of the first class of biomarkers is detected,the presence of the one or more members of the second class ofbiomarkers is detected, and/or the presence of aneuploidy is detected(e.g., when the presence of the one or more members of the first classof biomarkers is detected, the presence of the one or more members ofthe second class of biomarkers is detected, and the presence ofaneuploidy are detected).

In some embodiments of treating a subject who has been diagnosed oridentified as having a disease or who has been identified as being atrisk (e.g., increased risk) of having or developing a disease bydetecting of one or more members (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more members) of one or moreclasses of biomarkers and/or the presence of aneuploidy in a sampleobtained from the subject, the treatment is any of the variety oftherapeutic interventions disclosed herein including without limitation,chemotherapy, neoadjuvant chemotherapy, radiation therapy, hormonetherapy, cytotoxic therapy, immunotherapy, adoptive T cell therapy(e.g., chimeric antigen receptors and/or T cells having wild-type ormodified T cell receptors), targeted therapy such as administration ofkinase inhibitors (e.g., kinase inhibitors that target a particulargenetic lesion, such as a translocation or mutation), (e.g. a kinaseinhibitor, an antibody, a bispecific antibody), signal transductioninhibitors, bispecific antibodies or antibody fragments (e.g., BiTEs),monoclonal antibodies, immune checkpoint inhibitors, surgery (e.g.,surgical resection), or any combination of the above. In someembodiments in which the disease is cancer, a therapeutic interventionreduces the severity of the cancer, reduces a symptom of the cancer,and/or reduces the number of cancer cells present within the subject.

In some embodiments of treating a subject who has been diagnosed oridentified as having a disease or who has been identified as being atrisk (e.g., increased risk) of having or developing a disease (e.g., byany of the variety of methods described herein), the subject is alsoidentified as a subject who will or is likely to respond to thattreatment. In some embodiments of treating a subject who has beendiagnosed or identified as having a disease or who has been identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., by any of the variety of methods described herein), thesubject is also identified as a candidate for further diagnostic testing(e.g., prior to administration of the treatment and/or afteradministration of the treatment to determine the effect of thattreatment and/or whether the subject is a candidate for additionaladministrations of the same or a different treatment). In someembodiments of treating a subject who has been diagnosed or identifiedas having a disease or who has been identified as being at risk (e.g.,increased risk) of having or developing a disease (e.g., by any of thevariety of methods described herein), the subject is also identified asa candidate for increased monitoring (e.g., prior to administration ofthe treatment and/or after administration of the treatment to determinethe effect of that treatment and/or whether the subject is a candidatefor additional administrations of the same or a different treatment).

Method of Identifying a Treatment

Also provided herein are methods and materials for identifying atreatment for a subject who has been diagnosed or identified as having adisease (e.g., cancer) or who has been identified as being at risk(e.g., increased risk) of having or developing a disease (e.g., cancer)by detecting of one or more members (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more members) of one ormore classes of biomarkers and/or the presence of aneuploidy in a sampleobtained from the subject. In some embodiments of identifying atreatment for a subject who has been diagnosed or identified as having adisease (e.g., cancer) or who has been identified as being at risk(e.g., increased risk) of having or developing a disease (e.g., cancer),the presence of one or more members of one or more classes of biomarkersand/or the presence of aneuploidy are tested simultaneously (e.g., inone testing procedure, including embodiments in which the testingprocedure itself may include multiple discrete test methods of systems).In some embodiments of identifying a treatment for a subject who hasbeen diagnosed or identified as having a disease (e.g., cancer) or whohas been identified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer), the presence of one or more membersof one or more classes of biomarkers and/or the presence of aneuploidyare tested sequentially (e.g., in two or more different testingprocedures conducted at two or more different time points, includingembodiments in which the testing procedure itself may include multiplediscrete test methods of systems). In some embodiments of identifying atreatment for a subject who has been diagnosed or identified as having adisease (e.g., cancer) or who has been identified as being at risk(e.g., increased risk) of having or developing a disease (e.g., cancer)that include either simultaneous or sequential testing (or both) for thepresence of one or more members of one or more classes of biomarkersand/or the presence of aneuploidy, the testing may be performed on asingle sample or may be performed on two or more different samples(e.g., two or more different samples obtained from the same subject).

Any of the variety of detection methods described herein (see, e.g.,sections entitled “Detection of Genetic Biomarkers”, “Detection ofProtein Biomarkers”, and “Detection of Aneuploidy”) can be used todetect the presence of one or more members of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained from asubject. In some embodiments, the one or more members of the one or moreclasses of biomarkers and/or the one or more classes of biomarkers areassociated with a disease in a subject. In some embodiments, aneuploidyis associated with a disease in a subject. In some embodiments, thedisease is cancer (e.g., any of the variety of types of cancer describedherein). In some embodiments, the one or more members are members of aclass of genetic biomarkers. In some embodiments, the one or moremembers are members of a class of protein biomarkers. In someembodiments, methods that include identifying a treatment for a subjectwho has been diagnosed or identified as having a disease (e.g., cancer)or who has been identified as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) by detecting the presenceof one or more members of one or more classes of biomarkers in a sampleobtained from the subject further include detecting the presence ofaneuploidy in a sample obtained from the subject. For example, methodsthat include identifying a treatment for a subject who has beendiagnosed or identified as having a disease (e.g., cancer) or who hasbeen identified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) by detecting the presence of one ormore members of a class of genetic biomarkers in a sample obtained fromthe subject can further include detecting the presence of aneuploidy ina sample obtained from the subject (e.g., the same sample or twodifferent samples from the subject). As another example, methods thatinclude identifying a treatment for a subject who has been diagnosed oridentified as having a disease (e.g., cancer) or who has been identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) by detecting the presence of one or more membersof a class of protein biomarkers in a sample obtained from the subjectcan further include detecting the presence of aneuploidy in a sampleobtained from the subject (e.g., the same sample or two differentsamples from the subject). In some embodiments, methods that includeidentifying a treatment for a subject who has been diagnosed oridentified as having a disease (e.g., cancer) or who has been identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) by detecting both the presence of one or moremembers of a class of genetic biomarkers and detecting the presence ofone or more members of a class of protein biomarkers in a sampleobtained from the subject can further include detecting the presence ofaneuploidy in a sample obtained from the subject (e.g., the same sampleor two or more different samples from the subject).

In some embodiments, methods provided herein for identifying a treatmentfor a subject who has been diagnosed or identified as having a disease(e.g., cancer) or who has been identified as being at risk (e.g.,increased risk) of having or developing a disease (e.g., cancer) includedetecting the presence of one or more members of a single class ofbiomarkers in one or more samples obtained from a subject (e.g., geneticbiomarkers or protein biomarkers). In some embodiments, methods providedherein for identifying a treatment for a subject who has been diagnosedor identified as having a disease (e.g., cancer) or who has beenidentified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) include detecting the presence ofaneuploidy in one or more samples obtained from a subject (e.g., geneticbiomarkers or protein biomarkers). In some embodiments, methods providedherein for identifying a treatment for a subject who has been diagnosedor identified as having a disease (e.g., cancer) or who has beenidentified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) include detecting the presence ofone or more members of a single class of biomarkers in one or moresamples obtained from a subject (e.g., genetic biomarkers or proteinbiomarkers) and detecting the presence of aneuploidy in one or moresamples obtained from the subject. In some embodiments, methods providedherein for identifying a treatment for a subject who has been diagnosedor identified as having a disease (e.g., cancer) or who has beenidentified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) include detecting the presence ofone or more members of two or more classes of biomarkers in one or moresamples obtained from a subject (e.g., genetic biomarkers and proteinbiomarkers). In some embodiments, methods provided herein foridentifying a treatment for a subject who has been diagnosed oridentified as having a disease (e.g., cancer) or who has been identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) include detecting the presence of one or moremembers of two or more classes of biomarkers in one or more samplesobtained from a subject (e.g., genetic biomarkers and proteinbiomarkers) and detecting the presence of aneuploidy in one or moresamples obtained from the subject.

In some embodiments, a single sample obtained from a subject can betested to detect the presence of one or more members of one or moreclasses of biomarkers and/or for the presence of aneuploidy, and thesubject can be diagnosed or identified as having a disease (e.g.,cancer) or as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) and/or a treatment for the subjectcan be identified when the presence of the one or more members of theone or more classes of biomarkers and/or the presence of aneuploidy isdetected. Alternatively, two or more samples can be obtained from asubject, and each of the two or more samples can be individually testedto detect the presence of one or more members of one or more classes ofbiomarkers and/or for the presence of aneuploidy, and the subject can bediagnosed or identified as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) and/or a treatment for thesubject can be identified when the presence of the one or more membersof the one or more classes of biomarkers and/or the presence ofaneuploidy is detected. As one non-limiting example, a first sampleobtained from a subject can be tested to detect the presence of one ormore members of a first class of biomarkers (e.g., genetic biomarkers),and a second sample obtained from the subject can be tested to detectthe presence of one or more members of a second class of biomarkers(e.g., protein biomarkers), wherein the subject is diagnosed oridentified as having a disease (e.g., cancer) or as being at risk (e.g.,increased risk) of having or developing a disease (e.g., cancer) and/ora treatment for the subject is identified when the presence of the oneor more members of the first class of biomarkers is detected and/or thepresence of the one or more members of the second class of biomarkers isdetected (e.g., when the presence of the one or more members of theclass of biomarkers is detected and the presence of the one or moremembers of the second class of biomarkers are detected). As anothernon-limiting example, a first sample obtained from a subject can betested to detect the presence of one or more members of a class ofbiomarkers (e.g., genetic biomarkers or protein biomarkers), and asecond sample obtained from the subject can be tested to detect thepresence of aneuploidy, wherein the subject is diagnosed or identifiedas having a disease (e.g., cancer) or as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., cancer) and/or atreatment for the subject is identified when the presence of the one ormore members of the class of biomarkers is detected and/or the presenceaneuploidy is detected (e.g., when the presence of the one or moremembers of the class of biomarkers is detected and the presenceaneuploidy are detected). As another non-limiting example, a firstsample obtained from a subject can be tested to detect the presence ofone or more members of a first class of biomarkers (e.g., geneticbiomarkers) and to detect the presence of one or more members of asecond class of biomarkers (e.g., protein biomarkers), while a secondsample obtained from the subject can be tested to detect the presence ofaneuploidy, wherein the subject is diagnosed or identified as having adisease (e.g., cancer) or as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) and/or a treatment for thesubject is identified when the presence of the one or more members ofthe first class of biomarkers is detected, the presence of the one ormore members of the second class of biomarkers is detected, and/or thepresence of aneuploidy is detected (e.g., when the presence of the oneor more members of the first class of biomarkers is detected, thepresence of the one or more members of the second class of biomarkers isdetected, and the presence of aneuploidy are detected). As anothernon-limiting example, a first sample obtained from a subject can betested to detect the presence of one or more members of a first class ofbiomarkers (e.g., genetic or protein biomarkers) and to detect thepresence of aneuploidy, while a second sample obtained from the subjectcan be tested to detect the presence of one or more members of a secondclass of biomarkers (e.g., a class of biomarkers that is different fromthe first class that is tested for in the first sample), wherein thesubject is diagnosed or identified as having a disease (e.g., cancer) oras being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) and/or a treatment for the subject is identifiedwhen the presence of the one or more members of the first class ofbiomarkers is detected, the presence of the one or more members of thesecond class of biomarkers is detected, and/or the presence ofaneuploidy is detected (e.g., when the presence of the one or moremembers of the first class of biomarkers is detected, the presence ofthe one or more members of the second class of biomarkers is detected,and the presence of aneuploidy are detected).

In some embodiments of identifying a treatment for a subject who hasbeen diagnosed or identified as having a disease or who has beenidentified as being at risk (e.g., increased risk) of having ordeveloping a disease by detecting of one or more members (e.g., 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or moremembers) of one or more classes of biomarkers and/or the presence ofaneuploidy in a sample obtained from the subject, the identifiedtreatment is any of the variety of therapeutic interventions disclosedherein including without limitation, chemotherapy, neoadjuvantchemotherapy, radiation therapy, hormone therapy, cytotoxic therapy,immunotherapy, adoptive T cell therapy (e.g., chimeric antigen receptorsand/or T cells having wild-type or modified T cell receptors), targetedtherapy such as administration of kinase inhibitors (e.g., kinaseinhibitors that target a particular genetic lesion, such as atranslocation or mutation), (e.g. a kinase inhibitor, an antibody, abispecific antibody), signal transduction inhibitors, bispecificantibodies or antibody fragments (e.g., BiTEs), monoclonal antibodies,immune checkpoint inhibitors, surgery (e.g., surgical resection), or anycombination of the above. In some embodiments in which the disease iscancer, an identified therapeutic intervention reduces the severity ofthe cancer, reduces a symptom of the cancer, and/or reduces the numberof cancer cells present within the subject.

In some embodiments of identifying a treatment for a subject who hasbeen diagnosed or identified as having a disease or who has beenidentified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., by any of the variety of methods describedherein), the subject is also identified as a subject who will or islikely to respond to that treatment. In some embodiments of identifyinga treatment for a subject who has been diagnosed or identified as havinga disease or who has been identified as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., by any of the variety ofmethods described herein), the subject is also identified as a candidatefor further diagnostic testing. In some embodiments of identifying atreatment for a subject who has been diagnosed or identified as having adisease or who has been identified as being at risk (e.g., increasedrisk) of having or developing a disease (e.g., by any of the variety ofmethods described herein), the subject is also identified as a candidatefor increased monitoring. In some embodiments of identifying a treatmentfor a subject who has been diagnosed or identified as having a diseaseor who has been identified as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., by any of the variety of methodsdescribed herein), the subject is also administered a treatment (e.g.,any of the variety of therapeutic interventions described herein).

Identifying a Subject Who Will or is Likely to Respond to a Treatment

Also provided herein are methods and materials for identifying a subjectwho will or is likely to respond to a treatment by detecting of one ormore members (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, or more members) of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained fromthe subject. In some embodiments of identifying a subject who will or islikely to respond to a treatment, the presence of one or more members ofone or more classes of biomarkers and/or the presence of aneuploidy aretested simultaneously (e.g., in one testing procedure, includingembodiments in which the testing procedure itself may include multiplediscrete test methods of systems). In some embodiments of identifying asubject who will or is likely to respond to a treatment, the presence ofone or more members of one or more classes of biomarkers and/or thepresence of aneuploidy are tested sequentially (e.g., in two or moredifferent testing procedures conducted at two or more different timepoints, including embodiments in which the testing procedure itself mayinclude multiple discrete test methods of systems). In some embodimentsof identifying a subject who will or is likely to respond to a treatmentthat include either simultaneous or sequential testing (or both) for thepresence of one or more members of one or more classes of biomarkersand/or the presence of aneuploidy, the testing may be performed on asingle sample or may be performed on two or more different samples(e.g., two or more different samples obtained from the same subject).

Any of the variety of detection methods described herein (see, e.g.,sections entitled “Detection of Genetic Biomarkers”, “Detection ofProtein Biomarkers”, and “Detection of Aneuploidy”) can be used todetect the presence of one or more members of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained from asubject. In some embodiments, the one or more members of the one or moreclasses of biomarkers and/or the one or more classes of biomarkers areassociated with a disease in a subject. In some embodiments, aneuploidyis associated with a disease in a subject. In some embodiments, thedisease is cancer (e.g., any of the variety of types of cancer describedherein). In some embodiments, the one or more members are members of aclass of genetic biomarkers. In some embodiments, the one or moremembers are members of a class of protein biomarkers. In someembodiments, methods that include identifying a subject who will or islikely to respond to a treatment by detecting the presence of one ormore members of one or more classes of biomarkers in a sample obtainedfrom the subject further include detecting the presence of aneuploidy ina sample obtained from the subject. For example, methods that includeidentifying a subject who will or is likely to respond to a treatment bydetecting the presence of one or more members of a class of geneticbiomarkers in a sample obtained from the subject can further includedetecting the presence of aneuploidy in a sample obtained from thesubject (e.g., the same sample or two different samples from thesubject). As another example, methods that include identifying a subjectwho will or is likely to respond to a treatment by detecting thepresence of one or more members of a class of protein biomarkers in asample obtained from the subject can further include detecting thepresence of aneuploidy in a sample obtained from the subject (e.g., thesame sample or two different samples from the subject). In someembodiments, methods that include identifying a subject who will or islikely to respond to a treatment by detecting both the presence of oneor more members of a class of genetic biomarkers and detecting thepresence of one or more members of a class of protein biomarkers in asample obtained from the subject can further include detecting thepresence of aneuploidy in a sample obtained from the subject (e.g., thesame sample or two or more different samples from the subject).

In some embodiments, methods provided herein for identifying a subjectwho will or is likely to respond to a treatment include detecting thepresence of one or more members of a single class of biomarkers in oneor more samples obtained from a subject (e.g., genetic biomarkers orprotein biomarkers). In some embodiments, methods provided herein foridentifying a subject who will or is likely to respond to a treatmentinclude detecting the presence of aneuploidy in one or more samplesobtained from a subject (e.g., genetic biomarkers or proteinbiomarkers). In some embodiments, methods provided herein foridentifying a subject who will or is likely to respond to a treatmentinclude detecting the presence of one or more members of a single classof biomarkers in one or more samples obtained from a subject (e.g.,genetic biomarkers or protein biomarkers) and detecting the presence ofaneuploidy in one or more samples obtained from the subject. In someembodiments, methods provided herein for identifying a subject who willor is likely to respond to a treatment include detecting the presence ofone or more members of two or more classes of biomarkers in one or moresamples obtained from a subject (e.g., genetic biomarkers and proteinbiomarkers). In some embodiments, methods provided herein foridentifying a subject who will or is likely to respond to a treatmentinclude detecting the presence of one or more members of two or moreclasses of biomarkers in one or more samples obtained from a subject(e.g., genetic biomarkers and protein biomarkers) and detecting thepresence of aneuploidy in one or more samples obtained from the subject.

In some embodiments, a single sample obtained from a subject can betested to detect the presence of one or more members of one or moreclasses of biomarkers and/or for the presence of aneuploidy, and thesubject can be diagnosed or identified as having a disease (e.g.,cancer) or as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) and/or the subject can be identifiedas a subject who will or is likely to respond to a treatment when thepresence of the one or more members of the one or more classes ofbiomarkers and/or the presence of aneuploidy is detected. Alternatively,two or more samples can be obtained from a subject, and each of the twoor more samples can be individually tested to detect the presence of oneor more members of one or more classes of biomarkers and/or for thepresence of aneuploidy, and the subject can be diagnosed or identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) and/or the subject can be identified as a subjectwho will or is likely to respond to a treatment when the presence of theone or more members of the one or more classes of biomarkers and/or thepresence of aneuploidy is detected. As one non-limiting example, a firstsample obtained from a subject can be tested to detect the presence ofone or more members of a first class of biomarkers (e.g., geneticbiomarkers), and a second sample obtained from the subject can be testedto detect the presence of one or more members of a second class ofbiomarkers (e.g., protein biomarkers), wherein the subject is diagnosedor identified as having a disease (e.g., cancer) or as being at risk(e.g., increased risk) of having or developing a disease (e.g., cancer)and/or the subject is identified as a subject who will or is likely torespond to a treatment when the presence of the one or more members ofthe first class of biomarkers is detected and/or the presence of the oneor more members of the second class of biomarkers is detected (e.g.,when the presence of the one or more members of the class of biomarkersis detected and the presence of the one or more members of the secondclass of biomarkers are detected). As another non-limiting example, afirst sample obtained from a subject can be tested to detect thepresence of one or more members of a class of biomarkers (e.g., geneticbiomarkers or protein biomarkers), and a second sample obtained from thesubject can be tested to detect the presence of aneuploidy, wherein thesubject is diagnosed or identified as having a disease (e.g., cancer) oras being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) and/or the subject is identified as a subject whowill or is likely to respond to a treatment when the presence of the oneor more members of the class of biomarkers is detected and/or thepresence aneuploidy is detected (e.g., when the presence of the one ormore members of the class of biomarkers is detected and the presenceaneuploidy are detected). As another non-limiting example, a firstsample obtained from a subject can be tested to detect the presence ofone or more members of a first class of biomarkers (e.g., geneticbiomarkers) and to detect the presence of one or more members of asecond class of biomarkers (e.g., protein biomarkers), while a secondsample obtained from the subject can be tested to detect the presence ofaneuploidy, wherein the subject is diagnosed or identified as having adisease (e.g., cancer) or as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) and/or the subject isidentified as a subject who will or is likely to respond to a treatmentwhen the presence of the one or more members of the first class ofbiomarkers is detected, the presence of the one or more members of thesecond class of biomarkers is detected, and/or the presence ofaneuploidy is detected (e.g., when the presence of the one or moremembers of the first class of biomarkers is detected, the presence ofthe one or more members of the second class of biomarkers is detected,and the presence of aneuploidy are detected). As another non-limitingexample, a first sample obtained from a subject can be tested to detectthe presence of one or more members of a first class of biomarkers(e.g., genetic or protein biomarkers) and to detect the presence ofaneuploidy, while a second sample obtained from the subject can betested to detect the presence of one or more members of a second classof biomarkers (e.g., a class of biomarkers that is different from thefirst class that is tested for in the first sample), wherein the subjectis diagnosed or identified as having a disease (e.g., cancer) or asbeing at risk (e.g., increased risk) of having or developing a disease(e.g., cancer) and/or the subject is identified as a subject who will oris likely to respond to a treatment when the presence of the one or moremembers of the first class of biomarkers is detected, the presence ofthe one or more members of the second class of biomarkers is detected,and/or the presence of aneuploidy is detected (e.g., when the presenceof the one or more members of the first class of biomarkers is detected,the presence of the one or more members of the second class ofbiomarkers is detected, and the presence of aneuploidy are detected).

In some embodiments of identifying a subject who will or is likely torespond to a treatment by detecting of one or more members (e.g.,increased risk) of having or developing a disease by detecting of one ormore members (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, or more members) of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained fromthe subject, the subject is identified as a subject who will or islikely to respond to a treatment that is any of the variety oftherapeutic interventions disclosed herein including without limitation,chemotherapy, neoadjuvant chemotherapy, radiation therapy, hormonetherapy, cytotoxic therapy, immunotherapy, adoptive T cell therapy(e.g., chimeric antigen receptors and/or T cells having wild-type ormodified T cell receptors), targeted therapy such as administration ofkinase inhibitors (e.g., kinase inhibitors that target a particulargenetic lesion, such as a translocation or mutation), (e.g. a kinaseinhibitor, an antibody, a bispecific antibody), signal transductioninhibitors, bispecific antibodies or antibody fragments (e.g., BiTEs),monoclonal antibodies, immune checkpoint inhibitors, surgery (e.g.,surgical resection), or any combination of the above. In someembodiments in which the disease is cancer, a subject that is identifiedas a subject who will or is likely to respond to an identifiedtherapeutic intervention is identified as a subject in whom thetherapeutic intervention will or is likely to reduce the severity of thecancer, reduce a symptom of the cancer, and/or reduce the number ofcancer cells present within the subject.

In some embodiments, a subject identified as a subject who will or islikely to respond to a treatment (e.g., using any of the variety ofmethods described herein) is also identified for further diagnostictesting. In some embodiments, a subject identified as a subject who willor is likely to respond to a treatment (e.g., using any of the varietyof methods described herein) is also identified for increasedmonitoring. Additionally or alternatively, a subject identified as asubject who will or is likely to respond to a treatment (e.g., using anyof the variety of methods described herein) is also administered atreatment (e.g., any of the variety of therapeutic interventionsdescribed herein).

Methods of Identifying a Subject as a Candidate for Further DiagnosticTesting

Also provided herein are methods and materials for identifying a subjectas a candidate for further diagnostic testing by detecting of one ormore members (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, or more members) of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained fromthe subject. In some embodiments of identifying a subject as a candidatefor further diagnostic testing, the presence of one or more members ofone or more classes of biomarkers and/or the presence of aneuploidy aretested simultaneously (e.g., in one testing procedure, includingembodiments in which the testing procedure itself may include multiplediscrete test methods of systems). In some embodiments of identifying asubject as a candidate for further diagnostic testing, the presence ofone or more members of one or more classes of biomarkers and/or thepresence of aneuploidy are tested sequentially (e.g., in two or moredifferent testing procedures conducted at two or more different timepoints, including embodiments in which the testing procedure itself mayinclude multiple discrete test methods of systems). In some embodimentsof identifying a subject as a candidate for further diagnostic testingthat include either simultaneous or sequential testing (or both) for thepresence of one or more members of one or more classes of biomarkersand/or the presence of aneuploidy, the testing may be performed on asingle sample or may be performed on two or more different samples(e.g., two or more different samples obtained from the same subject).

Any of the variety of detection methods described herein (see, e.g.,sections entitled “Detection of Genetic Biomarkers”, “Detection ofProtein Biomarkers”, and “Detection of Aneuploidy”) can be used todetect the presence of one or more members of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained from asubject. In some embodiments, the one or more members of the one or moreclasses of biomarkers and/or the one or more classes of biomarkers areassociated with a disease in a subject. In some embodiments, aneuploidyis associated with a disease in a subject. In some embodiments, thedisease is cancer (e.g., any of the variety of types of cancer describedherein). In some embodiments, the one or more members are members of aclass of genetic biomarkers. In some embodiments, the one or moremembers are members of a class of protein biomarkers. In someembodiments, methods that include identifying a subject as a candidatefor further diagnostic testing by detecting the presence of one or moremembers of one or more classes of biomarkers in a sample obtained fromthe subject further include detecting the presence of aneuploidy in asample obtained from the subject. For example, methods that includeidentifying a subject as a candidate for further diagnostic testing bydetecting the presence of one or more members of a class of geneticbiomarkers in a sample obtained from the subject can further includedetecting the presence of aneuploidy in a sample obtained from thesubject (e.g., the same sample or two different samples from thesubject). As another example, methods that include identifying a subjectas a candidate for further diagnostic testing by detecting the presenceof one or more members of a class of protein biomarkers in a sampleobtained from the subject can further include detecting the presence ofaneuploidy in a sample obtained from the subject (e.g., the same sampleor two different samples from the subject). In some embodiments, methodsthat include identifying a subject as a candidate for further diagnostictesting by detecting both the presence of one or more members of a classof genetic biomarkers and detecting the presence of one or more membersof a class of protein biomarkers in a sample obtained from the subjectcan further include detecting the presence of aneuploidy in a sampleobtained from the subject (e.g., the same sample or two or moredifferent samples from the subject).

In some embodiments, methods provided herein for identifying a subjectas a candidate for further diagnostic testing include detecting thepresence of one or more members of a single class of biomarkers in oneor more samples obtained from a subject (e.g., genetic biomarkers orprotein biomarkers). In some embodiments, methods provided herein foridentifying a subject as a candidate for further diagnostic testinginclude detecting the presence of aneuploidy in one or more samplesobtained from a subject (e.g., genetic biomarkers or proteinbiomarkers). In some embodiments, methods provided herein for iidentifying a subject as a candidate for further diagnostic testinginclude detecting the presence of one or more members of a single classof biomarkers in one or more samples obtained from a subject (e.g.,genetic biomarkers or protein biomarkers) and detecting the presence ofaneuploidy in one or more samples obtained from the subject. In someembodiments, methods provided herein for identifying a subject as acandidate for further diagnostic testing include detecting the presenceof one or more members of two or more classes of biomarkers in one ormore samples obtained from a subject (e.g., genetic biomarkers andprotein biomarkers). In some embodiments, methods provided herein foridentifying a subject as a candidate for further diagnostic testinginclude detecting the presence of one or more members of two or moreclasses of biomarkers in one or more samples obtained from a subject(e.g., genetic biomarkers and protein biomarkers) and detecting thepresence of aneuploidy in one or more samples obtained from the subject.

In some embodiments, a single sample obtained from a subject can betested to detect the presence of one or more members of one or moreclasses of biomarkers and/or for the presence of aneuploidy, and thesubject can be diagnosed or identified as having a disease (e.g.,cancer) or as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) and/or the subject can be identifiedas a subject who is a candidate for further diagnostic testing when thepresence of the one or more members of the one or more classes ofbiomarkers and/or the presence of aneuploidy is detected. Alternatively,two or more samples can be obtained from a subject, and each of the twoor more samples can be individually tested to detect the presence of oneor more members of one or more classes of biomarkers and/or for thepresence of aneuploidy, and the subject can be diagnosed or identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) and/or the subject can be identified as a subjectwho is a candidate for further diagnostic testing when the presence ofthe one or more members of the one or more classes of biomarkers and/orthe presence of aneuploidy is detected. As one non-limiting example, afirst sample obtained from a subject can be tested to detect thepresence of one or more members of a first class of biomarkers (e.g.,genetic biomarkers), and a second sample obtained from the subject canbe tested to detect the presence of one or more members of a secondclass of biomarkers (e.g., protein biomarkers), wherein the subject isdiagnosed or identified as having a disease (e.g., cancer) or as beingat risk (e.g., increased risk) of having or developing a disease (e.g.,cancer) and/or the subject is identified as a subject who is a candidatefor further diagnostic testing when the presence of the one or moremembers of the first class of biomarkers is detected and/or the presenceof the one or more members of the second class of biomarkers is detected(e.g., when the presence of the one or more members of the class ofbiomarkers is detected and the presence of the one or more members ofthe second class of biomarkers are detected). As another non-limitingexample, a first sample obtained from a subject can be tested to detectthe presence of one or more members of a class of biomarkers (e.g.,genetic biomarkers or protein biomarkers), and a second sample obtainedfrom the subject can be tested to detect the presence of aneuploidy,wherein the subject is diagnosed or identified as having a disease(e.g., cancer) or as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) and/or the subject is identified asa subject who is a candidate for further diagnostic testing when thepresence of the one or more members of the class of biomarkers isdetected and/or the presence aneuploidy is detected (e.g., when thepresence of the one or more members of the class of biomarkers isdetected and the presence aneuploidy are detected). As anothernon-limiting example, a first sample obtained from a subject can betested to detect the presence of one or more members of a first class ofbiomarkers (e.g., genetic biomarkers) and to detect the presence of oneor more members of a second class of biomarkers (e.g., proteinbiomarkers), while a second sample obtained from the subject can betested to detect the presence of aneuploidy, wherein the subject isdiagnosed or identified as having a disease (e.g., cancer) or as beingat risk (e.g., increased risk) of having or developing a disease (e.g.,cancer) and/or the subject is identified as a subject who is a candidatefor further diagnostic testing when the presence of the one or moremembers of the first class of biomarkers is detected, the presence ofthe one or more members of the second class of biomarkers is detected,and/or the presence of aneuploidy is detected (e.g., when the presenceof the one or more members of the first class of biomarkers is detected,the presence of the one or more members of the second class ofbiomarkers is detected, and the presence of aneuploidy are detected). Asanother non-limiting example, a first sample obtained from a subject canbe tested to detect the presence of one or more members of a first classof biomarkers (e.g., genetic or protein biomarkers) and to detect thepresence of aneuploidy, while a second sample obtained from the subjectcan be tested to detect the presence of one or more members of a secondclass of biomarkers (e.g., a class of biomarkers that is different fromthe first class that is tested for in the first sample), wherein thesubject is diagnosed or identified as having a disease (e.g., cancer) oras being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) and/or the subject is identified as a subject whois a candidate for further diagnostic testing when the presence of theone or more members of the first class of biomarkers is detected, thepresence of the one or more members of the second class of biomarkers isdetected, and/or the presence of aneuploidy is detected (e.g., when thepresence of the one or more members of the first class of biomarkers isdetected, the presence of the one or more members of the second class ofbiomarkers is detected, and the presence of aneuploidy are detected).

In some embodiments of identifying a subject identifying a subject as acandidate for further diagnostic testing by detecting of one or moremembers (e.g., increased risk) of having or developing a disease bydetecting of one or more members (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more members) of one or moreclasses of biomarkers and/or the presence of aneuploidy in a sampleobtained from the subject, the subject is for any of the variety oftypes of further diagnostic testing disclosed herein including, withoutlimitation, a scan (e.g., a computed tomography (CT), a CT angiography(CTA), a esophagram (a Barium swallom), a Barium enema, a magneticresonance imaging (MRI), a PET scan, a positron emission tomography andcomputed tomography (PET-CT) scan, an ultrasound (e.g., an endobronchialultrasound, an endoscopic ultrasound), an X-ray, or a DEXA scan) or aphysical examination (e.g., an anoscopy, a bronchoscopy (e.g., anautofluorescence bronchoscopy, a white-light bronchoscopy, anavigational bronchoscopy), a colonoscopy, a digital breasttomosynthesis, an endoscopic retrograde cholangiopancreatography (ERCP),an ensophagogastroduodenoscopy, a mammography, a Pap smear, or a pelvicexam).

In some embodiments, a subject identified as a candidate for furtherdiagnostic testing (e.g., using any of the variety of methods describedherein) is also identified as a candidate for increased monitoring.Additionally or alternatively, a subject identified as a candidate forfurther diagnostic testing (e.g., using any of the variety of methodsdescribed herein) is also identified as a subject who will or is likelyto respond to a treatment. Additionally or alternatively, a subjectidentified as a candidate for further diagnostic testing (e.g., usingany of the variety of methods described herein) is also administered atreatment (e.g., any of the variety of therapeutic interventionsdescribed herein).

Methods of Identifying a Subject as a Candidate for Increased Monitoring

Also provided herein are methods and materials for identifying a subjectas a candidate for increased monitoring by detecting of one or moremembers (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, or more members) of one or more classes of biomarkersand/or the presence of aneuploidy in a sample obtained from the subject.In some embodiments of identifying a subject as a candidate forincreased monitoring, the presence of one or more members of one or moreclasses of biomarkers and/or the presence of aneuploidy are testedsimultaneously (e.g., in one testing procedure, including embodiments inwhich the testing procedure itself may include multiple discrete testmethods of systems). In some embodiments of identifying a subject as acandidate for increased monitoring, the presence of one or more membersof one or more classes of biomarkers and/or the presence of aneuploidyare tested sequentially (e.g., in two or more different testingprocedures conducted at two or more different time points, includingembodiments in which the testing procedure itself may include multiplediscrete test methods of systems). In some embodiments of identifying asubject as a candidate for increased monitoring that include eithersimultaneous or sequential testing (or both) for the presence of one ormore members of one or more classes of biomarkers and/or the presence ofaneuploidy, the testing may be performed on a single sample or may beperformed on two or more different samples (e.g., two or more differentsamples obtained from the same subject).

Any of the variety of detection methods described herein (see, e.g.,sections entitled “Detection of Genetic Biomarkers”, “Detection ofProtein Biomarkers”, and “Detection of Aneuploidy”) can be used todetect the presence of one or more members of one or more classes ofbiomarkers and/or the presence of aneuploidy in a sample obtained from asubject. In some embodiments, the one or more members of the one or moreclasses of biomarkers and/or the one or more classes of biomarkers areassociated with a disease in a subject. In some embodiments, aneuploidyis associated with a disease in a subject. In some embodiments, thedisease is cancer (e.g., any of the variety of types of cancer describedherein). In some embodiments, the one or more members are members of aclass of genetic biomarkers. In some embodiments, the one or moremembers are members of a class of protein biomarkers. In someembodiments, methods that include identifying a subject as a candidatefor increased monitoring by detecting the presence of one or moremembers of one or more classes of biomarkers in a sample obtained fromthe subject further include detecting the presence of aneuploidy in asample obtained from the subject. For example, methods that includeidentifying a subject as a candidate for increased monitoring bydetecting the presence of one or more members of a class of geneticbiomarkers in a sample obtained from the subject can further includedetecting the presence of aneuploidy in a sample obtained from thesubject (e.g., the same sample or two different samples from thesubject). As another example, methods that include identifying a subjectas a candidate for increased monitoring by detecting the presence of oneor more members of a class of protein biomarkers in a sample obtainedfrom the subject can further include detecting the presence ofaneuploidy in a sample obtained from the subject (e.g., the same sampleor two different samples from the subject). In some embodiments, methodsthat include identifying a subject as a candidate for increasedmonitoring by detecting both the presence of one or more members of aclass of genetic biomarkers and detecting the presence of one or moremembers of a class of protein biomarkers in a sample obtained from thesubject can further include detecting the presence of aneuploidy in asample obtained from the subject (e.g., the same sample or two or moredifferent samples from the subject).

In some embodiments, methods provided herein for identifying a subjectas a candidate for increased monitoring include detecting the presenceof one or more members of a single class of biomarkers in one or moresamples obtained from a subject (e.g., genetic biomarkers or proteinbiomarkers). In some embodiments, methods provided herein foridentifying a subject as a candidate for increased monitoring includedetecting the presence of aneuploidy in one or more samples obtainedfrom a subject (e.g., genetic biomarkers or protein biomarkers). In someembodiments, methods provided herein for i identifying a subject as acandidate for increased monitoring include detecting the presence of oneor more members of a single class of biomarkers in one or more samplesobtained from a subject (e.g., genetic biomarkers or protein biomarkers)and detecting the presence of aneuploidy in one or more samples obtainedfrom the subject. In some embodiments, methods provided herein foridentifying a subject as a candidate for increased monitoring includedetecting the presence of one or more members of two or more classes ofbiomarkers in one or more samples obtained from a subject (e.g., geneticbiomarkers and protein biomarkers). In some embodiments, methodsprovided herein for identifying a subject as a candidate for increasedmonitoring include detecting the presence of one or more members of twoor more classes of biomarkers in one or more samples obtained from asubject (e.g., genetic biomarkers and protein biomarkers) and detectingthe presence of aneuploidy in one or more samples obtained from thesubject.

In some embodiments, a single sample obtained from a subject can betested to detect the presence of one or more members of one or moreclasses of biomarkers and/or for the presence of aneuploidy, and thesubject can be diagnosed or identified as having a disease (e.g.,cancer) or as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) and/or the subject can be identifiedas a subject who is a candidate for increased monitoring when thepresence of the one or more members of the one or more classes ofbiomarkers and/or the presence of aneuploidy is detected. Alternatively,two or more samples can be obtained from a subject, and each of the twoor more samples can be individually tested to detect the presence of oneor more members of one or more classes of biomarkers and/or for thepresence of aneuploidy, and the subject can be diagnosed or identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) and/or the subject can be identified as a subjectwho is a candidate for increased monitoring when the presence of the oneor more members of the one or more classes of biomarkers and/or thepresence of aneuploidy is detected. As one non-limiting example, a firstsample obtained from a subject can be tested to detect the presence ofone or more members of a first class of biomarkers (e.g., geneticbiomarkers), and a second sample obtained from the subject can be testedto detect the presence of one or more members of a second class ofbiomarkers (e.g., protein biomarkers), wherein the subject is diagnosedor identified as having a disease (e.g., cancer) or as being at risk(e.g., increased risk) of having or developing a disease (e.g., cancer)and/or the subject is identified as a subject who is a candidate forincreased monitoring when the presence of the one or more members of thefirst class of biomarkers is detected and/or the presence of the one ormore members of the second class of biomarkers is detected (e.g., whenthe presence of the one or more members of the class of biomarkers isdetected and the presence of the one or more members of the second classof biomarkers are detected). As another non-limiting example, a firstsample obtained from a subject can be tested to detect the presence ofone or more members of a class of biomarkers (e.g., genetic biomarkersor protein biomarkers), and a second sample obtained from the subjectcan be tested to detect the presence of aneuploidy, wherein the subjectis diagnosed or identified as having a disease (e.g., cancer) or asbeing at risk (e.g., increased risk) of having or developing a disease(e.g., cancer) and/or the subject is identified as a subject who is acandidate for increased monitoring when the presence of the one or moremembers of the class of biomarkers is detected and/or the presenceaneuploidy is detected (e.g., when the presence of the one or moremembers of the class of biomarkers is detected and the presenceaneuploidy are detected). As another non-limiting example, a firstsample obtained from a subject can be tested to detect the presence ofone or more members of a first class of biomarkers (e.g., geneticbiomarkers) and to detect the presence of one or more members of asecond class of biomarkers (e.g., protein biomarkers), while a secondsample obtained from the subject can be tested to detect the presence ofaneuploidy, wherein the subject is diagnosed or identified as having adisease (e.g., cancer) or as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) and/or the subject isidentified as a subject who is a candidate for increased monitoring whenthe presence of the one or more members of the first class of biomarkersis detected, the presence of the one or more members of the second classof biomarkers is detected, and/or the presence of aneuploidy is detected(e.g., when the presence of the one or more members of the first classof biomarkers is detected, the presence of the one or more members ofthe second class of biomarkers is detected, and the presence ofaneuploidy are detected). As another non-limiting example, a firstsample obtained from a subject can be tested to detect the presence ofone or more members of a first class of biomarkers (e.g., genetic orprotein biomarkers) and to detect the presence of aneuploidy, while asecond sample obtained from the subject can be tested to detect thepresence of one or more members of a second class of biomarkers (e.g., aclass of biomarkers that is different from the first class that istested for in the first sample), wherein the subject is diagnosed oridentified as having a disease (e.g., cancer) or as being at risk (e.g.,increased risk) of having or developing a disease (e.g., cancer) and/orthe subject is identified as a subject who is a candidate for increasedmonitoring when the presence of the one or more members of the firstclass of biomarkers is detected, the presence of the one or more membersof the second class of biomarkers is detected, and/or the presence ofaneuploidy is detected (e.g., when the presence of the one or moremembers of the first class of biomarkers is detected, the presence ofthe one or more members of the second class of biomarkers is detected,and the presence of aneuploidy are detected).

In some embodiments, a subject identified as a candidate for increasedmonitoring (e.g., using any of the variety of methods described herein)is also identified as a candidate for further diagnostic testing.Additionally or alternatively, a subject identified as a candidate forincreased monitoring (e.g., using any of the variety of methodsdescribed herein) is also identified as a subject who will or is likelyto respond to a treatment. Additionally or alternatively, a subjectidentified as a candidate for increased monitoring (e.g., using any ofthe variety of methods described herein) is also administered atreatment (e.g., any of the variety of therapeutic interventionsdescribed herein).

Genetic Biomarkers in Combination with Protein Biomarkers

In one aspect, provided herein are methods and materials for detectingthe presence of one or more members of a panel of genetic biomarkers andthe presence of one or more members of a panel of protein biomarkers inone or more samples obtained from a subject. In another aspect, providedherein are methods and materials for diagnosing or identifying thepresence of a disease in a subject (e.g., identifying the subject ashaving cancer) by detecting the presence of one or more members of apanel of genetic biomarkers and the presence of one or more members of apanel of protein biomarkers in one or more samples obtained from thesubject. In another aspect, provided herein are methods and materialsfor identifying a subject as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) by detecting the presenceof one or more members of a panel of genetic biomarkers and the presenceof one or more members of a panel of protein biomarkers in one or moresamples obtained from the subject. In another aspect, provided hereinare methods and materials for treating a subject who has been diagnosedor identified as having a disease (e.g., cancer) or who has beenidentified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) by detecting the presence of one ormore members of a panel of genetic biomarkers and the presence of one ormore members of a panel of protein biomarkers in one or more samplesobtained from the subject. In another aspect, provided herein aremethods and materials for identifying a treatment for a subject who hasbeen diagnosed or identified as having a disease (e.g., cancer) or whohas been identified as being at risk (e.g., increased risk) of having ordeveloping a disease (e.g., cancer) by detecting the presence of one ormore members of a panel of genetic biomarkers and the presence of one ormore members of a panel of protein biomarkers in one or more samplesobtained from the subject. In another aspect, provided herein aremethods and materials for identifying a subject who will or is likely torespond to a treatment by detecting the presence of one or more membersof a panel of genetic biomarkers and the presence of one or more membersof a panel of protein biomarkers in one or more samples obtained fromthe subject. In another aspect, provided herein are methods andmaterials for identifying a subject as a candidate for furtherdiagnostic testing by detecting the presence of one or more members of apanel of genetic biomarkers and the presence of one or more members of apanel of protein biomarkers in one or more samples obtained from thesubject. In another aspect, provided herein are methods and materialsfor identifying a subject as a candidate for increased monitoring bydetecting the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers in one or more samples obtained from the subject.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject provide high sensitivity in thedetection or diagnosis of cancer (e.g., a high frequency or incidence ofcorrectly identifying a subject as having cancer). In some embodiments,methods provided herein that include detecting the presence of one ormore members of a panel of genetic biomarkers and the presence of one ormore members of a panel of protein biomarkers in one or more samplesobtained from a subject provide a sensitivity in the detection ordiagnosis of cancer (e.g., a high frequency or incidence of correctlyidentifying a subject as having cancer) that is higher than thesensitivity provided by separately detecting the presence of one or moremembers of a panel of genetic biomarkers or the presence of one or moremembers of a panel of protein biomarkers. In some embodiments, methodsand materials provided herein that include detecting the presence of oneor more members of a panel of genetic biomarkers and the presence of oneor more members of a panel of protein biomarkers in one or more samplesobtained from a subject provide a sensitivity of at least about 70%, atleast about 75%, at least about 80%, at least about 85%, at least about90%, at least about 91%, at least about 92%, at least about 93%, atleast about 94%, at least about 95%, at least about 96%, at least about97%, at least about 98%, at least about 99%, or higher. In someembodiments, methods and materials provided herein that includedetecting the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers in one or more samples obtained from a subject provide highsensitivity in detecting a single type of cancer. In some embodiments,methods and materials provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject provide high sensitivity indetecting two or more types of cancers. Any of a variety of cancer typescan be detected using methods and materials provided herein (see, e.g.,the section entitled “Cancers”). In some embodiments, cancers that canbe detected using methods and materials that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject include pancreatic cancer. Insome embodiments, cancers that can be detected using methods andmaterials that include detecting the presence of one or more members ofa panel of genetic biomarkers and the presence of one or more members ofa panel of protein biomarkers in one or more samples obtained from asubject include liver cancer, ovarian cancer, esophageal cancer, stomachcancer, pancreatic cancer, colorectal cancer, lung cancer, or breastcancer. In some embodiments, cancers that can be detected using methodsand materials that include detecting the presence of one or more membersof a panel of genetic biomarkers and the presence of one or more membersof a panel of protein biomarkers in one or more samples obtained from asubject include cancers of the female reproductive tract (e.g., cervicalcancer, endometrial cancer, ovarian cancer, or fallopian tubal cancer).In some embodiments, cancers that can be detected using methods andmaterials that include detecting the presence of one or more members ofa panel of genetic biomarkers and the presence of one or more members ofa panel of protein biomarkers in one or more samples obtained from asubject include bladder cancer or upper-tract urothelial carcinomas.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject provide high specificity in thedetection or diagnosis of cancer (e.g., a low frequency or incidence ofincorrectly identifying a subject as having cancer when that subjectdoes not have cancer). In some embodiments, methods provided herein thatinclude detecting the presence of one or more members of a panel ofgenetic biomarkers and the presence of one or more members of a panel ofprotein biomarkers in one or more samples obtained from a subjectprovide a specificity in the detection or diagnosis of cancer (e.g., ahigh frequency or incidence of correctly identifying a subject as havingcancer) that is higher than the specificity provided by separatelydetecting the presence of one or more members of a panel of geneticbiomarkers or the presence of one or more members of a panel of proteinbiomarkers. In some embodiments, methods and materials provided hereinthat include that include detecting the presence of one or more membersof a panel of genetic biomarkers and the presence of one or more membersof a panel of protein biomarkers in one or more samples obtained from asubject provide a specificity of at least about 70%, at least about 75%,at least about 80%, at least about 85%, at least about 90%, at leastabout 91%, at least about 92%, at least about 93%, at least about 94%,at least about 95%, at least about 96%, at least about 97%, at leastabout 98%, at least about 99%, or higher. As will be understood by thoseof ordinary skill in the art, a specificity of 99% means that only 1% ofsubjects that do not have cancer are incorrectly identified as havingcancer. In some embodiments, methods and materials provided herein thatinclude detecting the presence of one or more members of a panel ofgenetic biomarkers and the presence of one or more members of a panel ofprotein biomarkers in one or more samples obtained from a subjectprovide high specificity in detecting a single cancer (e.g., there is alow probability of incorrectly identifying that subject as having thatsingle cancer type). In some embodiments, methods and materials providedherein that include detecting the presence of one or more members of apanel of genetic biomarkers and the presence of one or more members of apanel of protein biomarkers in one or more samples obtained from asubject provide high specificity in detecting two or more cancers (e.g.,there is a low probability of incorrectly identifying that subject ashaving those two or more cancer types).

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject include detecting the presenceof: 1) one or more genetic biomarkers in one or more (e.g., 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) of the following genes:NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS,KRAS, AKT1, TP53, PPP2R1A, and/or GNAS, and 2) one or more (e.g., 1, 2,3, 4, 5, 6, 7, or 8) of the following protein biomarkers: CA19-9, CEA,HGF, OPN, CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO). Insome embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject include detecting the presenceof: 1) one or more genetic biomarkers in each of the following genes:NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS,KRAS, AKT1, TP53, PPP2R1A, and GNAS, and 2) one or more (e.g., 1, 2, 3,4, 5, 6, 7, or 8) of the following protein biomarkers: CA19-9, CEA, HGF,OPN, CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO). In someembodiments, methods provided herein that include detecting the presenceof one or more members of a panel of genetic biomarkers and the presenceof one or more members of a panel of protein biomarkers in one or moresamples obtained from a subject include detecting the presence of: 1)one or more genetic biomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) of the following genes: NRAS,CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS,AKT1, TP53, PPP2R1A, and/or GNAS, and 2) each of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, andmyeloperoxidase (MPO). In some embodiments, methods provided herein thatinclude detecting the presence of one or more members of a panel ofgenetic biomarkers and the presence of one or more members of a panel ofprotein biomarkers in one or more samples obtained from a subjectinclude detecting the presence of: 1) one or more genetic biomarkers ineach of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR,BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS,and 2) each of the following protein biomarkers: CA19-9, CEA, HGF, OPN,CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO). In someembodiments of methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject include detecting the presenceof: 1) one or more genetic biomarkers in one or more (e.g., 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) of the following genes:NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS,KRAS, AKT1, TP53, PPP2R1A, and/or GNAS, and 2) one or more (e.g., 1, 2,3, 4, 5, 6, 7, or 8) of the following protein biomarkers: CA19-9, CEA,HGF, OPN, CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO), thesubject is determined as having (e.g., diagnosed to have) or isdetermined to be (e.g. diagnosed as being) at elevated risk of having ordeveloping one of the following types of cancer: liver cancer, ovariancancer, esophageal cancer, stomach cancer, pancreatic cancer, colorectalcancer, lung cancer, and/or breast cancer.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject include detecting the presenceof: 1) one or more genetic biomarkers in one or more (e.g., 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) of the following genes:NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS,KRAS, AKT1, TP53, PPP2R1A, and/or GNAS, and 2) one or more (e.g., 1, 2,3, 4, 5, 6, 7, 8, 9, 10, or 11) of the following protein biomarkers:CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin,G-CSF, and/or CA15-3. In some embodiments, methods provided herein thatinclude detecting the presence of one or more members of a panel ofgenetic biomarkers and the presence of one or more members of a panel ofprotein biomarkers in one or more samples obtained from a subjectinclude detecting the presence of: 1) one or more genetic biomarkers ineach of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR,BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and GNAS,and 2) one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11) of thefollowing protein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP,prolactin, TIMP-1, follistatin, G-CSF, and/or CA15-3. In someembodiments, methods provided herein that include detecting the presenceof one or more members of a panel of genetic biomarkers and the presenceof one or more members of a panel of protein biomarkers in one or moresamples obtained from a subject include detecting the presence of: 1)one or more genetic biomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) of the following genes: NRAS,CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS,AKT1, TP53, PPP2R1A, and/or GNAS, and 2) each of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1,follistatin, G-CSF, and CA15-3. In some embodiments, methods providedherein that include detecting the presence of one or more members of apanel of genetic biomarkers and the presence of one or more members of apanel of protein biomarkers in one or more samples obtained from asubject include detecting the presence of: 1) one or more geneticbiomarkers in each of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7,APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A,and/or GNAS, and 2) each of the following protein biomarkers: CA19-9,CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, and/orCA15-3. In some embodiments of methods provided herein that includedetecting the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers in one or more samples obtained from a subject includedetecting the presence of: 1) one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) ofthe following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF,CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS, and2) one or more (e.g., 1, 2, 3, 4, 5, 6, 7, or 8) of the followingprotein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin,TIMP-1, follistatin, G-CSF, and/or CA15-3, the subject is determined ashaving (e.g., diagnosed to have) or is determined to be (e.g. diagnosedas being) at elevated risk of having or developing cancer one of thefollowing types of cancer: liver cancer, ovarian cancer, esophagealcancer, stomach cancer, pancreatic cancer, colorectal cancer, lungcancer, and/or breast cancer.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject include detecting the presenceof: 1) one or more genetic biomarkers in one or more (e.g., 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) of the following genes:NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS,KRAS, AKT1, TP53, PPP2R1A, and/or GNAS, and 2) one or more (e.g., 1, 2,3, 4, 5, 6, 7, 8, or 9) of the following protein biomarkers: CA19-9,CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and/or CA15-3. In someembodiments, methods provided herein that include detecting the presenceof one or more members of a panel of genetic biomarkers and the presenceof one or more members of a panel of protein biomarkers in one or moresamples obtained from a subject include detecting the presence of: 1)one or more genetic biomarkers in each of the following genes: NRAS,CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS,AKT1, TP53, PPP2R1A, and GNAS, and 2) one or more (e.g., 1, 2, 3, 4, 5,6, 7, 8, or 9) of the following protein biomarkers: CA19-9, CEA, HGF,OPN, CA125, AFP, prolactin, TIMP-1, and/or CA15-3. In some embodiments,methods provided herein that include detecting the presence of one ormore members of a panel of genetic biomarkers and the presence of one ormore members of a panel of protein biomarkers in one or more samplesobtained from a subject include detecting the presence of: 1) one ormore genetic biomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, or 16) of the following genes: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, and/or GNAS, and 2) each of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, andCA15-3. In some embodiments, methods provided herein that includedetecting the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers in one or more samples obtained from a subject includedetecting the presence of: 1) one or more genetic biomarkers in each ofthe following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF,CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and GNAS, and 2)each of the following protein biomarkers: CA19-9, CEA, HGF, OPN, CA125,AFP, prolactin, TIMP-1, and CA15-3. In some embodiments of methodsprovided herein that include detecting the presence of one or moremembers of a panel of genetic biomarkers and the presence of one or moremembers of a panel of protein biomarkers in one or more samples obtainedfrom a subject include detecting the presence of: 1) one or more geneticbiomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, or 16) of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7,APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A,and/or GNAS, and 2) one or more (e.g., 1, 2, 3, 4, 5, 6, 7, or 8) of thefollowing protein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP,prolactin, TIMP-1, and/or CA15-3, the subject is determined as having(e.g., diagnosed to have) or is determined to be (e.g. diagnosed asbeing) at elevated risk of having or developing one of the followingtypes of cancer: liver cancer, ovarian cancer, esophageal cancer,stomach cancer, pancreatic cancer, colorectal cancer, lung cancer,and/or breast cancer.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject include detecting the presenceof: 1) one or more genetic biomarkers in one or more (e.g., 1, 2, 3, or4) of the following genes: KRAS (e.g., genetic biomarkers in codons 12and/or 61), TP53, CDKN2A, and/or SMAD4, and 2) one or more (e.g., 1, 2,3, or 4) of the following protein biomarkers: CA19-9, CEA, HGF, and/orOPN. In some embodiments, methods provided herein that include detectingthe presence of one or more members of a panel of genetic biomarkers andthe presence of one or more members of a panel of protein biomarkers inone or more samples obtained from a subject include detecting thepresence of: 1) one or more genetic biomarkers in each of the followinggenes: KRAS (e.g., genetic biomarkers in codons 12 and/or 61), TP53,CDKN2A, and SMAD4, and 2) one or more (e.g., 1, 2, 3, or 4) of thefollowing protein biomarkers: CA19-9, CEA, HGF, and/or OPN. In someembodiments, methods provided herein that include detecting the presenceof one or more members of a panel of genetic biomarkers and the presenceof one or more members of a panel of protein biomarkers in one or moresamples obtained from a subject include detecting the presence of: 1)one or more genetic biomarkers in one or more (e.g., 1, 2, 3, or 4) ofthe following genes: KRAS (e.g., genetic biomarkers in codons 12 and/or61), TP53, CDKN2A, and/or SMAD4, and 2) each of the following proteinbiomarkers: CA19-9, CEA, HGF, and OPN. In some embodiments, methodsprovided herein that include detecting the presence of one or moremembers of a panel of genetic biomarkers and the presence of one or moremembers of a panel of protein biomarkers in one or more samples obtainedfrom a subject include detecting the presence of: 1) one or more geneticbiomarkers in each of the following genes: KRAS (e.g., geneticbiomarkers in codons 12 and/or 61), TP53, CDKN2A, and SMAD4, and 2) eachof the following protein biomarkers: CA19-9, CEA, HGF, and OPN. In someembodiments of methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject include detecting the presenceof: 1) one or more genetic biomarkers in one or more (e.g., 1, 2, 3, or4) of the following genes: KRAS (e.g., genetic biomarkers in codons 12and/or 61), TP53, CDKN2A, and/or SMAD4, and 2) one or more (e.g., 1, 2,3, or 4) of the following protein biomarkers: CA19-9, CEA, HGF, and/orOPN, a subject is determined as having (e.g., diagnosed to have) or isdetermined to be (e.g. diagnosed as being) at elevated risk of having ordeveloping pancreatic cancer.

A sample obtained from a subject can be any of the variety of samplesdescribed herein that contains cell-free DNA (e.g., ctDNA) and/orproteins. In some embodiments, cell-free DNA (e.g., ctDNA) and/orproteins in a sample obtained from the subject are derived from a tumorcell. In some embodiments, cell-free DNA (e.g., ctDNA) in a sampleobtained from the subject includes one or more genetic biomarkers. Insome embodiments, proteins in a sample obtained from the subjectincludes one or more protein biomarkers. Non-limiting examples ofsamples in which genetic biomarkers and/or protein biomarkers can bedetected include blood, plasma, and serum. In some embodiments, thepresence of one or more genetic biomarkers and the presence of one ormore protein biomarkers is detected in a single sample obtained from thesubject. In some embodiments, the presence of one or more geneticbiomarkers is detected in a first sample obtained from a subject, andthe presence of one or more protein biomarkers is detected in a secondsample obtained from the subject.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers (e.g.,each member of a panel of genetic biomarkers) and the presence of one ormore members of a panel of protein biomarkers (e.g., each member of apanel of protein biomarkers) in one or more samples obtained from asubject, an elevated level of one or more members of the panel ofprotein biomarkers can be detected. For example, an elevated level of aprotein biomarker can be a level that is higher that a reference level.A reference level can be any level of the protein biomarker that is notassociated with the presence of cancer. For example, a reference levelof a protein biomarker can be a level that is present in a referencesubject that does not have cancer or does not harbor a cancer cell. Areference level of a protein biomarker can be the average level that ispresent in a plurality of reference subjects that do not have cancer ordo not harbor a cancer cell. A reference level of a protein biomarker ina subject determined to have cancer can be the level that was presencein the subject prior to the onset of cancer. In some embodiments, apanel of protein biomarkers in which one or more members of the panel ispresent at an elevated level includes one or more of (e.g., 1, 2, 3, 4,5, 6, 7, or each of): CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1,and/or myeloperoxidase (MPO). In some embodiments, a panel of proteinbiomarkers in which one or more members of the panel is present at anelevated level includes one or more of (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, or each of): CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1,follistatin, G-CSF, and/or CA15-3. In some embodiments, a panel ofprotein biomarkers in which one or more members of the panel is presentat an elevated level includes one or more of (e.g., 1, 2, 3, 4, 5, 6, 7,8, or each of): CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1,and/or CA15-3. In some embodiments, a panel of protein biomarkers inwhich one or more members of the panel is present at an elevated levelincludes one or more of (e.g., 1, 2, 3, or each of): CA19-9, CEA, HGF,and/or OPN.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers (e.g.,each member of a panel of genetic biomarkers) and the presence of one ormore members of a panel of protein biomarkers (e.g., each member of apanel of protein biomarkers) in one or more samples obtained from asubject, a decreased level of one or more members of the panel ofprotein biomarkers can be detected. For example, a decreased level of aprotein biomarker can be a level that is lower that a reference level. Areference level can be any level of the protein biomarker that is notassociated with the presence of cancer. For example, a reference levelof a protein biomarker can be a level that is present in a referencesubject that does not have cancer or does not harbor a cancer cell. Areference level of a protein biomarker can be the average level that ispresent in a plurality of reference subjects that do not have cancer ordo not harbor a cancer cell. A reference level of a protein biomarker ina subject determined to have cancer can be the level that was presencein the subject prior to the onset of cancer. In some embodiments, apanel of protein biomarkers in which one or more members of the panel ispresent at a decreased level includes one or more of (e.g., 1, 2, 3, 4,5, 6, 7, or each of): CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1,and/or myeloperoxidase (MPO). In some embodiments, a panel of proteinbiomarkers in which one or more members of the panel is present at adecreased level includes one or more of (e.g., 1, 2, 3, 4, 5, 6, 7, 8,9, 10, or each of): CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin,TIMP-1, follistatin, G-CSF, and/or CA15-3. In some embodiments, a panelof protein biomarkers in which one or more members of the panel ispresent at a decreased level includes one or more of (e.g., 1, 2, 3, 4,5, 6, 7, 8, or each of): CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin,TIMP-1, and/or CA15-3. In some embodiments, a panel of proteinbiomarkers in which one or more members of the panel is present at adecreased level includes one or more of (e.g., 1, 2, 3, or each of):CA19-9, CEA, HGF, and/or OPN.

In some embodiments, when a subject is determined as having (e.g.,diagnosed to have) cancer or determined to be (e.g. diagnosed as being)at elevated risk of having or developing cancer (e.g., by detecting: 1)the presence of one or more genetic biomarkers in one or more (e.g., 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) of the followinggenes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN,FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS, and 2) the presenceof one or more protein biomarkers in any of the panels described hereinas being useful in conjunction with this genetic biomarker panel), thesubject is selected as a candidate for (e.g., is selected for) furtherdiagnostic testing (e.g., any of the variety of further diagnostictesting methods described herein), the subject is selected as acandidate for (e.g. is selected for) increased monitoring (e.g., any ofthe variety of increasing monitoring methods described herein), thesubject is identified as a subject who will or is likely to respond to atreatment (e.g., any of the variety of therapeutic interventionsdescribed herein), the subject is selected as a candidate for (e.g., isselected for) a treatment, a treatment (e.g., any of the variety oftherapeutic interventions described herein) is selected for the subject,and/or a treatment (e.g., any of the variety of therapeuticinterventions described herein) is administered to the subject. Forexample, when a subject is determined as having (e.g., diagnosed tohave) cancer or determined to be (e.g. diagnosed as being) at elevatedrisk of having or developing cancer, the subject can undergo furtherdiagnostic testing, which further diagnostic testing can confirm thepresence of cancer in the subject. Additionally or alternatively, thesubject can be monitored at in increased frequency. In some embodimentsof a subject determined as having (e.g., diagnosed to have) cancer ordetermined to be (e.g. diagnosed as being) at elevated risk of having ordeveloping cancer in which the subject undergoes further diagnostictesting and/or increased monitoring, the subject can additionally beadministered a therapeutic intervention. In some embodiments, after asubject is administered a therapeutic intervention, the subjectundergoes additional further diagnostic testing (e.g., the same type offurther diagnostic testing as was performed previously and/or adifferent type of further diagnostic testing) and/or continued increasedmonitoring (e.g., increased monitoring at the same or at a differentfrequency as was previously done). In embodiments, after a subject isadministered a therapeutic intervention and the subject undergoesadditional further diagnostic testing and/or additional increasedmonitoring, the subject is administered another therapeutic intervention(e.g., the same therapeutic intervention as was previously administeredand/or a different therapeutic intervention). In some embodiments, aftera subject is administered a therapeutic intervention, the subject istested for the presence of one or more genetic biomarkers in one or more(e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) of thefollowing genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A,PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS, and 2) thepresence of one or more protein biomarkers in any of the panelsdescribed herein as being useful in conjunction with this geneticbiomarker panel.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject further include detecting thepresence of aneuploidy in a sample obtained from the subject (e.g., thesame sample use to detect either or both of the presence of one or moremembers of a panel of genetic biomarkers and the presence of one or moremembers of a panel of protein biomarkers, or a different sample). Thepresence of aneuploidy in any chromosome or portion thereof (e.g., anarm of a chromosome) can be detected. In some embodiments of methodsthat include detecting the presence of genetic biomarkers, proteinbiomarkers, and aneuploidy, the presence of aneuploidy on one or more ofchromosome arms 5q, 8q, and 9p is detected. In some embodiments ofmethods that include detecting the presence of genetic biomarkers,protein biomarkers, and aneuploidy, the presence of aneuploidy on one ormore of chromosome arms 4p, 7q, 8q, and 9q is detected.

In some embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore genetic biomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, or 16) of the following genes: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, and/or GNAS, and 2) one or more (e.g., 1, 2, 3, 4, 5, 6,7, or 8) of the following protein biomarkers: CA19-9, CEA, HGF, OPN,CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO), the methodsfurther include detecting the presence of aneuploidy in a sampleobtained from the subject (e.g., the same sample use to detect either orboth of the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers, or a different sample). In some embodiments of methodsprovided herein that include detecting in one or more samples obtainedfrom a subject the presence of: 1) one or more genetic biomarkers ineach of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR,BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and GNAS,and 2) one or more (e.g., 1, 2, 3, 4, 5, 6, 7, or 8) of the followingprotein biomarkers: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1,and/or myeloperoxidase (MPO), the methods further include detecting thepresence of aneuploidy in a sample obtained from the subject (e.g., thesame sample use to detect either or both of the presence of one or moremembers of a panel of genetic biomarkers and the presence of one or moremembers of a panel of protein biomarkers, or a different sample). Insome embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore genetic biomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, or 16) of the following genes: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, and/or GNAS, and 2) each of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, andmyeloperoxidase (MPO), the methods further include detecting thepresence of aneuploidy in a sample obtained from the subject (e.g., thesame sample use to detect either or both of the presence of one or moremembers of a panel of genetic biomarkers and the presence of one or moremembers of a panel of protein biomarkers, or a different sample). Insome embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore genetic biomarkers in each of the following genes: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, and/or GNAS, and 2) each of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, and/ormyeloperoxidase (MPO), the methods further include detecting thepresence of aneuploidy in a sample obtained from the subject (e.g., thesame sample use to detect either or both of the presence of one or moremembers of a panel of genetic biomarkers and the presence of one or moremembers of a panel of protein biomarkers, or a different sample). Insome embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore genetic biomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, or 16) of the following genes: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, and/or GNAS, 2) one or more (e.g., 1, 2, 3, 4, 5, 6, 7,or 8) of the following protein biomarkers: CA19-9, CEA, HGF, OPN, CA125,prolactin, TIMP-1, and/or myeloperoxidase (MPO), and 3) the presence ofaneuploidy, the subject is determined as having (e.g., diagnosed tohave) or is determined to be (e.g. diagnosed as being) at elevated riskof having or developing one of the following types of cancer: livercancer, ovarian cancer, esophageal cancer, stomach cancer, pancreaticcancer, colorectal cancer, lung cancer, and/or breast cancer.

In some embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore genetic biomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, or 16) of the following genes: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, and/or GNAS, and 2) one or more (e.g., 1, 2, 3, 4, 5, 6,7, 8, 9, 10, or 11) of the following protein biomarkers: CA19-9, CEA,HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, and/orCA15-3, the methods further include detecting the presence of aneuploidyin a sample obtained from the subject (e.g., the same sample use todetect either or both of the presence of one or more members of a panelof genetic biomarkers and the presence of one or more members of a panelof protein biomarkers, or a different sample). In some embodiments ofmethods provided herein that include detecting in one or more samplesobtained from a subject the presence of: 1) one or more geneticbiomarkers in each of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7,APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A,and GNAS, and 2) one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or11) of the following protein biomarkers: CA19-9, CEA, HGF, OPN, CA125,AFP, prolactin, TIMP-1, follistatin, G-CSF, and/or CA15-3, the methodsfurther include detecting the presence of aneuploidy in a sampleobtained from the subject (e.g., the same sample use to detect either orboth of the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers, or a different sample). In some embodiments, methodsprovided herein that include detecting the presence of one or moremembers of a panel of genetic biomarkers and the presence of one or moremembers of a panel of protein biomarkers in one or more samples obtainedfrom a subject include detecting the presence of: 1) one or more geneticbiomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, or 16) of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7,APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A,and/or GNAS, and 2) each of the following protein biomarkers: CA19-9,CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, andCA15-3, the methods further include detecting the presence of aneuploidyin a sample obtained from the subject (e.g., the same sample use todetect either or both of the presence of one or more members of a panelof genetic biomarkers and the presence of one or more members of a panelof protein biomarkers, or a different sample). In some embodiments ofmethods provided herein that detecting in one or more samples obtainedfrom a subject the presence of: 1) one or more genetic biomarkers ineach of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR,BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS,and 2) each of the following protein biomarkers: CA19-9, CEA, HGF, OPN,CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, and/or CA15-3, themethods further include detecting the presence of aneuploidy in a sampleobtained from the subject (e.g., the same sample use to detect either orboth of the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers, or a different sample). In some embodiments of methodsprovided herein that detecting in one or more samples obtained from asubject the presence of: 1) one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) ofthe following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF,CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS, 2)one or more (e.g., 1, 2, 3, 4, 5, 6, 7, or 8) of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1,follistatin, G-CSF, and/or CA15-3, and 3) the presence of aneuploidy,the subject is determined as having (e.g., diagnosed to have) or isdetermined to be (e.g. diagnosed as being) at elevated risk of having ordeveloping cancer one of the following types of cancer: liver cancer,ovarian cancer, esophageal cancer, stomach cancer, pancreatic cancer,colorectal cancer, lung cancer, and/or breast cancer.

In some embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore genetic biomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, or 16) of the following genes: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, and/or GNAS, and 2) one or more (e.g., 1, 2, 3, 4, 5, 6,7, 8, or 9) of the following protein biomarkers: CA19-9, CEA, HGF, OPN,CA125, AFP, prolactin, TIMP-1, and/or CA15-3, the methods furtherinclude detecting the presence of aneuploidy in a sample obtained fromthe subject (e.g., the same sample use to detect either or both of thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers, or adifferent sample). In some embodiments or methods provided herein thatinclude detecting in one or more samples obtained from a subject thepresence of: 1) one or more genetic biomarkers in each of the followinggenes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN,FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and GNAS, and 2) one or more(e.g., 1, 2, 3, 4, 5, 6, 7, 8, or 9) of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and/orCA15-3, the methods further include detecting the presence of aneuploidyin a sample obtained from the subject (e.g., the same sample use todetect either or both of the presence of one or more members of a panelof genetic biomarkers and the presence of one or more members of a panelof protein biomarkers, or a different sample). In some embodiments ofmethods provided herein that include detecting in one or more samplesobtained from a subject the presence of: 1) one or more geneticbiomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, or 16) of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7,APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A,and/or GNAS, and 2) each of the following protein biomarkers: CA19-9,CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and CA15-3, the methodsfurther include detecting the presence of aneuploidy in a sampleobtained from the subject (e.g., the same sample use to detect either orboth of the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers, or a different sample). In some embodiments of methodsprovided herein that include detecting in one or more samples obtainedfrom a subject the presence of: 1) one or more genetic biomarkers ineach of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR,BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and GNAS,and 2) each of the following protein biomarkers: CA19-9, CEA, HGF, OPN,CA125, AFP, prolactin, TIMP-1, and CA15-3, the methods further includedetecting the presence of aneuploidy in a sample obtained from thesubject (e.g., the same sample use to detect either or both of thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers, or adifferent sample). In some embodiments of methods provided herein thatinclude detecting in one or more samples obtained from a subject thepresence of: 1) one or more genetic biomarkers in one or more (e.g., 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) of the followinggenes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN,FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS, 2) one or more(e.g., 1, 2, 3, 4, 5, 6, 7, or 8) of the following protein biomarkers:CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and/or CA15-3, and3) the presence of aneuploidy, the subject is determined as having(e.g., diagnosed to have) or is determined to be (e.g. diagnosed asbeing) at elevated risk of having or developing one of the followingtypes of cancer: liver cancer, ovarian cancer, esophageal cancer,stomach cancer, pancreatic cancer, colorectal cancer, lung cancer,and/or breast cancer.

In some embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore genetic biomarkers in one or more (e.g., 1, 2, 3, or 4) of thefollowing genes: KRAS (e.g., genetic biomarkers in codons 12 and/or 61),TP53, CDKN2A, and/or SMAD4, and 2) one or more (e.g., 1, 2, 3, or 4) ofthe following protein biomarkers: CA19-9, CEA, HGF, and/or OPN, themethods further include detecting the presence of aneuploidy in a sampleobtained from the subject (e.g., the same sample use to detect either orboth of the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers, or a different sample). In some embodiments of methodsprovided herein that include detecting in one or more samples obtainedfrom a subject the presence of: 1) one or more genetic biomarkers ineach of the following genes: KRAS (e.g., genetic biomarkers in codons 12and/or 61), TP53, CDKN2A, and SMAD4, and 2) one or more (e.g., 1, 2, 3,or 4) of the following protein biomarkers: CA19-9, CEA, HGF, and/or OPN,the methods further include detecting the presence of aneuploidy in asample obtained from the subject (e.g., the same sample use to detecteither or both of the presence of one or more members of a panel ofgenetic biomarkers and the presence of one or more members of a panel ofprotein biomarkers, or a different sample). In some embodiments ofmethods provided herein that include detecting in one or more samplesobtained from a subject the presence of: 1) one or more geneticbiomarkers in one or more (e.g., 1, 2, 3, or 4) of the following genes:KRAS (e.g., genetic biomarkers in codons 12 and/or 61), TP53, CDKN2A,and/or SMAD4, and 2) each of the following protein biomarkers: CA19-9,CEA, HGF, and OPN, the methods further include detecting the presence ofaneuploidy in a sample obtained from the subject (e.g., the same sampleuse to detect either or both of the presence of one or more members of apanel of genetic biomarkers and the presence of one or more members of apanel of protein biomarkers, or a different sample). In some embodimentsor methods provided herein that include detecting in one or more samplesobtained from a subject the presence of: 1) one or more geneticbiomarkers in each of the following genes: KRAS (e.g., geneticbiomarkers in codons 12 and/or 61), TP53, CDKN2A, and SMAD4, and 2) eachof the following protein biomarkers: CA19-9, CEA, HGF, and OPN, themethods further include detecting the presence of aneuploidy in a sampleobtained from the subject (e.g., the same sample use to detect either orboth of the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers, or a different sample). In some embodiments of methodsprovided herein that include detecting in one or more samples obtainedfrom a subject the presence of: 1) one or more genetic biomarkers in oneor more (e.g., 1, 2, 3, or 4) of the following genes: KRAS (e.g.,genetic biomarkers in codons 12 and/or 61), TP53, CDKN2A, and/or SMAD4,and 2) one or more (e.g., 1, 2, 3, or 4) of the following proteinbiomarkers: CA19-9, CEA, HGF, and/or OPN, a subject is determined ashaving (e.g., diagnosed to have) or is determined to be (e.g. diagnosedas being) at elevated risk of having or developing pancreatic cancer.

In some embodiments, any of the variety of methods provided herein thatinclude detecting the presence of one or more members of a panel ofgenetic biomarkers and the presence of one or more members of a panel ofprotein biomarkers in one or more samples obtained from a subjectfurther include detecting the presence of one or more members of one ormore additional classes of biomarkers. Non-limiting examples of suchadditional classes of biomarkers includes: copy number changes, DNAmethylation changes, other nucleic acids (e.g., mRNAs, miRNAs, lncRNAs,circRNA, mtDNA, telomeric DNA, translocation and genomicrearrangements), peptides, and/or metabolites.

In some embodiments, the one or more additional classes of biomarkersinclude a metabolite biomarker. In some embodiments, a subject isdetermined to be at elevated risk of having or developing cancer if thebiological sample contains one or more metabolites indicative of cancer.In some embodiments, a subject is determined as having cancer if thebiological sample contains one or more metabolites indicative of cancer.Non-limiting examples of metabolites indicative of cancer include:5-methylthioadenosine (MTA), Glutathione reduced (GSH),N-acetylglutamate, Lactose, N-acetylneuraminate, UDP-acetylglucosamine,UDP-Acetylgalactosamine, UDP-glucuronate, Pantothenate, Arachidonate(20:4n6), Choline, Cytidine 5′-diphosphocholine, Dihomo-linolenate(20:3n3), Docosapentaenoate (DPA 22:5n3), Eicosapentaenoate (EPA20:5n3), Glycerophosphorylcholine (GPC), Docosahexaenoate (DHA 22:6n3),Linoleate (18:2n6), Cytidine 5′-monophosphate (5′-CMP),Gamma-glutamylglutamate, X-14577, X-11583, Isovalerylcarnitine,Phosphocreatine, 2-Aminoadipic acid, Gluconic acid, O-Acetylcarnitine,aspartic acid, Deamido-NAD+, glutamic acid, Isobutyrylcarnitine,Carnitine, Pyridoxal, Citric acid, Adenosine, ATP, valine, XC0061,Isoleucine, γ-Butyrobetaine, Lactic acid, alanine, phenylalanine,Gluconolactone, leucine, Glutathione (GSSG) divalent, tyrosine, NAD+,XC0016, UTP, creatine, Theobromine, CTP, GTP, 3-Methylhistidine,Succinic acid, Glycerol 3-phosphate, glutamine, 5-Oxoproline, Thiamine,Butyrylcarnitine, 4-Acetamidobutanoic acid, UDP-Glucose, UDP-Galactose,threonine, N-Acetylglycine, proline, ADP, Choline, Malic acid,S-Adenosylmethionine, Pantothenic acid, Cysteinesulfinic acid,6-Aminohexanoic acid, Homocysteic acid, Hydroxyproline, Methioninesulfoxide, 3-Guanidinopropionic acid, Glucose 6-phosphate, Phenaceturicacid, Threonic acid, tryptophan, Pyridoxine, N-Acetylaspartic acid,4-Guanidinobutyric acid, serine, Citrulline, Betaine,N-Acetylasparagine, 2-Hydroxyglutaric acid, arginine, Glutathione (GSH),creatinine, Dihydroxyacetone phosphate, histidine, glycine, Glucose1-phosphate, N-Formylglycine, Ketoprofen, lysine, beta-alanine,N-Acetylglutamic acid, 2-Amino-2-(hydroxymethyl)-1,3-propanediol,Ornithine, Phosphorylcholine, Glycerophosphocholine, Terephthalic acid,Glyceraldehyde 3-phosphate, Gly-Asp, Taurine, Fructose 1,6-diphosphate,3-Aminoisobutyric acid, Spermidine, GABA, Triethanolamine, Glycerol,N-Acetylserine, N-Acetylornithine, Diethanolamine, AMP, Cysteineglutathione disulfide, Streptomycin sulfate+H2O divalent,trans-Glutaconic acid, Nicotinic acid, Isobutylamine, Betainealdehyde+H2O, Urocanic acid, 1-Aminocyclopropane-1-carboxylic acidHomoserinelactone, 5-Aminovaleric acid, 3-Hydroxybutyric acid,Ethanolamine, Isovaleric acid, N-Methylglutamic acid, Cystathionine,Spermine, Carnosine, 1-Methylnicotinamide, N-Acetylneuraminic acid,Sarcosine, GDP, N-Methylalanine, palmitic acid,1,2-dioleoyl-sn-glycero-3-phospho-rac-glycerolcholesterol 5α,6αepoxidelanosterol, lignoceric acid, 1oleoyl_rac_GL, cholesterol_epoxide,erucic acid, T-LCA, oleoyl-L-carnitine, oleanolic acid,3-phosphoglycerate, 5-hydroxynorvaline, 5-methoxytryptamine,adenosine-5-monophosphate, alpha-ketoglutarate, asparagine, benzoicacid, hypoxanthine, maltose, maltotriose, methionine sulfoxide,nornicotine, phenol, Phosphoethanolamine, pyrophosphate, pyruvic acid,quinic acid, taurine, uric acid, inosine, lactamide, 5-hydroxynorvalineNIST, cholesterol, deoxypentitol, 2-hydroxyestrone, 2-hydroxyestradiol,2-metholyestrone, 2-metholxyestradiol, 2-hydroxyestrone-3-methyl ether,4-hydroxyestrone, 4-metholxyestrone, 4-methoxyestradiol,16alpha-hydroxyestrone, 17-epiestriol, estriol, 16-Ketoestradiol,16-epiestriol, acylcarnitine C18:1, amino acids citrulline andtrans-4-hydroxyproline, glycerophospholipids PC aa C28:1, PC ae C30:0and PC ae C30:2, and sphingolipid SM (OH) C14:1. See e.g., Halama etal., Nesting of colon and ovarian cancer cells in the endothelial nicheis associated with alterations in glycan and lipid metabolism,Scientific Reports volume 7, Article number: 39999 (2017); Hur et al.,Systems approach to characterize the metabolism of liver cancer stemcells expressing CD133, Sci Rep., 7: 45557, doi: 10.1038/srep45557,(2017); Eliassen et al., Urinary Estrogens and Estrogen Metabolites andSubsequent Risk of Breast Cancer among Premenopausal Women, Cancer Res;72(3); 696-706 (2011); Gangi et al., Metabolomic profile in pancreaticcancer patients: a consensus-based approach to identify highlydiscriminating metabolites, Oncotarget, February 2; 7(5): 5815-5829(2016); Kumar et al., Serum and Plasma Metabolomic Biomarkers for LungCancer, Bioinformation, 13(6): 202-208, doi: 10.6026/97320630013202(2017); Schmidt et al., Pre-diagnostic metabolite concentrations andprostate cancer risk in 1077 cases and 1077 matched controls in theEuropean Prospective Investigation into Cancer and Nutrition, BMC Med.,15: 122, doi: 10.1186/s12916-017-0885-6 (2017); each of which isincorporated herein by reference in its entirety.

In some embodiments, the one or more additional classes of biomarkersinclude a peptide (e.g., a peptide that is distinct from the variousprotein biomarkers described herein as being useful in one or moremethods). In some embodiments, a subject is determined to be at elevatedrisk of having or developing cancer if the biological sample containsone or more peptides indicative of cancer. In some embodiments, asubject is determined as having cancer if the biological sample containsone or more peptides indicative of cancer. In some embodiments, apeptide is derived from a protein (e.g., the peptide includes an aminoacid sequence present in a protein biomarker or a different protein).Non-limiting examples of peptides indicative of cancer include thefollowing peptides and peptides derived from the following proteins:CEACAM, CYFRA21-1, CA125, PKLK, ProGRP, NSE, TPA 6, TPA 7, TPA 8, NRG,NRG 100, CNDP, APOB100, SCC, VEGF, EGFR, PIK3CA, HER2, BRAF, ROS, RET,NRAS, MET, MEK1, HER2, C4.4A, PSF3, FAM83B, ECD, CTNNB, VIM, S100A4,S100A7, COX2, MUC1, KLKB1, SAA, HP-β chain, C9, Pgrmc1, Ciz1,Transferrin, α-1 antitrypsin, apolipo protein 1, complement c3a,Caveolin-1, Kallikrein 6, Glucose regulated protein-8, αdefensing-1,-2,-3, Serum C-peptide, Alpha-2-HS glycol protein, TrypticKRT 8 peptide, Plasma glycol protein, Catenin, Defensin α 6, MMPs,Cyclin D, S100 P, Lamin A/C filament protein, Heat shock protein,aldehyde dehydrogenase, Tx1-2, (thioredoxin like protein-2), P53, nm23,u-PA, VEGF, Eph B4, CRABP2, WT-1, Rab-3D, Mesothelin, ERα, ANXA4, PSAT1,SPB5, CEA5, CEA6, AlAT, SLPI, APOA4, VDBP, HE4, IL-1, -6, -7, -8, -10,-11, -12, -16, -18, -21, -23, -28A, -33, LIF, TNFR1-2, HVEM (TNFRSF14),IL1R-a, IL1R-b, IL-2R, M-CSF, MIP-1a, TNF-α, CD40, RANTES, CD40L, MIF,IFN-β, MCP-4 (CCL13), MIG (CXCL9), MIP-1δ (CCL15), MIP3a (CCL20), MIP-4(CCL18), MPIF-1, SDF-1a+b (CXCL12), CD137/4-1BB, lymphotactin (XCL1),eotaxin-1 (CCL11), eotaxin-2 (CCL24), 6Ckine/CCL21), BLC (CXCL13), CTACK(CCL27), BCA-1 (CXCL13), HCC4 (CCL16), CTAP-3 (CXCL7), IGF1, VEGF,VEGFR3, EGFR, ErbB2, CTGF, PDGF AA, BB, PDGFRb, bFGF, TGFbRIII,β-cellulin, IGFBP1-4, 6, BDNF, PEDF, angiopoietin-2, renin,lysophosphatidic acid, β2-microglobulin, sialyl TN, ACE, CA 19-9, CEA,CA 15-3, CA-50, CA 72-4, OVX1, mesothelin, sialyl TN, MMP-2, -3, -7, -9,VAP-1, TIMP1-2, tenascin C, VCAM-1, osteopontin, KIM-1, NCAM,tetranectin, nidogen-2, cathepsin L, prostasin, matriptase, kallikreins2, 6, 10, cystatin C, claudin, spondin2, SLPI, bHCG, urinarygonadotropin peptide, inhibin, leptin, adiponectin, GH, TSH, ACTH, PRL,FSH, LH, cortisol, TTR, osteocalcin, insulin, ghrelin, GIP, GLP-1,amylin, glucagon, peptide YY, follistatin, hepcidin, CRP, Apo A1, CIII,H, transthyretin, SAA, SAP, complement C3,4, complement factor H,albumin, ceruloplasmin, haptoglobin, β-hemoglobin, transferrin,ferritin, fibrinogen, thrombin, von Willebrand factor, myoglobin,immunosuppressive acidic protein, lipid-associated sialic acid, S100A12(EN-RAGE), fetuin A, clusterin, α1-antitrypsin, a2-macroglobulin,serpin1 (human plasminogen activator inhibitor-1), Cox-1, Hsp27, Hsp60,Hsp80, Hsp90, lectin-type oxidized LDL receptor 1, CD14, lipocalin 2,ITIH4, sFasL, Cyfra21-1, TPA, perforin, DcR3, AGRP, creatine kinase-MB,human milk fat globule 1-2, NT-Pro-BNP, neuron-specific enolase, CASA,NB/70K, AFP, afamin, collagen, prohibitin, keratin-6, PARC, B7-H4,YK-L40, AFP-L3, DCP, GPC3, OPN, GP73, CK19, MDK, A2, 5-HIAA, CA15-3,CA19-9, CA27.29, CA72-4, calcitonin, CGA, BRAF V600E, BAP, BCT-ABLfusion protein, KIT, KRAS, PSA, Lactate dehydrogenase, NMP22, PAI-1,uPA, fibrin D-dimer, 5100, TPA, thyroglobulin, CD20, CD24, CD44,RS/DJ-1, p53, alpha-2-HS-glycoprotein, lipophilin B, beta-globin,hemopexin, UBE2N, PSMB6, PPP1CB, CPT2, COPA, MSK1/2, Pro-NPY,Secernin-1, Vinculin, NAAA, PTK7, TFG, MCCC2, TRAP1, IMPDH2, PTEN,POSTN, EPLIN, eIF4A3, DDAH1, ARG2, PRDX3&4, P4HB, YWHAG, EnoylCoA-hydrase, PHB, TUBB, KRT2, DES, HSP71, ATP5B, CKB, HSPD1, LMNA, EZH2,AMACR, FABP5, PPA2, EZR, SLP2, SM22, Bax, Smac/Diablo phosphorylatedBcl2, STAT3 and Smac/Diablo expression, PHB, PAP, AMACR, PSMA, FKBP4,PRDX4, KRT7/8/18, GSTP1, NDPK1, MTX2, GDF15, PCa-24, Caveolin-2,Prothrombin, Antithrombin-III, Haptoglobin, Serum amyloid A-1 protein,ZAG, ORM2, APOC3, CALML5, IGFBP2, MUC5AC, PNLIP, PZP, TIMP1, AMBP,inter-alpha-trypsin inhibitor heavy chain H1, inter-alpha-trypsininhibitor heavy chain H2, inter-alpha-trypsin inhibitor heavy chain H3,V-type proton ATPase subunit B, kidney isoform, Hepatocyte growthfactor-like protein, Serum amyloid P-component, Acylglycerol kinase,Leucine-rich repeat-containing protein 9, Beta-2-glycoprotein 1, Plasmaprotease C1 inhibitor, Lipoxygenase homology domain-containing protein1, Protocadherin alpha-13. See, e.g., Kuppusamy et al., Volume 24, Issue6, September 2017, Pages 1212-1221; Elzek and Rodland, Cancer MetastasisRev. 2015 March; 34(1): 83-96; Noel and Lokshin, Future Oncol. 2012January; 8(1): 55-71; Tsuchiya et al., World J Gastroenterol. 2015 Oct.7; 21(37): 10573-10583; Lou et al., Biomark Cancer. 2017; 9: 1-9; Parket al., Oncotarget. 2017 Jun. 27; 8(26): 42761-42771; Saraswat et al.,Cancer Med. 2017 July; 6(7): 1738-1751; Zamay et al., Cancers (Basel).2017 November; 9(11): 155; Tanase et al., Oncotarget. 2017 Mar. 14;8(11): 18497-18512, each of which is incorporated herein by reference inits entirety.

In some embodiments, the one or more additional classes of biomarkersinclude nucleic acid lesions or variations (e.g., a nucleic acid lesionor variation that is distinct from the various genetic biomarkersdescribed herein as being useful in one or more methods). In someembodiments, a subject is determined to be at elevated risk of having ordeveloping cancer if the biological sample contains one or more nucleicacid lesions or variations indicative of cancer. In some embodiments, asubject is determined as having cancer if the biological sample containsone or more nucleic acid lesions or variations indicative of cancer.Non-limiting examples of nucleic acid lesions or variations include copynumber changes, DNA methylation changes, and/or other nucleic acids(e.g., mRNAs, miRNAs, lncRNAs, circRNA, mtDNA, telomeric DNA,translocation and genomic rearrangements). Translocations and genomicrearrangements have been correlated with various cancers (e.g.,prostate, glioma, lung cancer, non-small cell lung cancer, melanoma, andthyroid cancer) and used as biomarkers for years (e.g., Demeure et al.,2014, World J Surg., 38:1296-305; Hogenbirk et al., 2016, PNAS USA,113:E3649-56; Gasi et al., 2011, PLoS One, 6:e16332; Ogiwara et al.,2008, Oncogene, 27:4788-97; U.S. Pat. Nos. 9,745,632; and 6,576,420). Inaddition, changes in copy number have been used as biomarkers forvarious cancers including, without limitation, head and neck squamouscell carcinoma, lymphoma (e.g., non-Hodgkin's lymphoma) and colorectalcancer (Kumar et al., 2017, Tumour Biol, 39:1010428317740296; Kumar etal., 2017, Tumour Biol., 39:1010428317736643; Henrique et al., 2014,Expert Rev. Mol. Diagn., 14:419-22; and U.S. Pat. No. 9,816,139). DNAmethylation and changes in DNA methylation (e.g., hypomethylation,hypermethylation) also are used as biomarkers in cancer. For example,hypomethylation has been associated with hepatocellular carcinoma (see,for example, Henrique et al., 2014, Expert Rev. Mol. Diagn., 14:419-22),esophageal carcinogenesis (see, for example, Alvarez et al., 2011, PLoSGenet., 7:e1001356) and gastric and liver cancer (see, for example, U.S.Pat. No. 8,728,732), and hypermethylation has been associated withcolorectal cancer (see, for example, U.S. Pat. No. 9,957,570;). Inaddition to genome-wide changes in methylation, specific methylationchanges within particular genes can be indicative of specific cancers(see, for example, U.S. Pat. No. 8,150,626). Li et al. (2012, J.Epidemiol., 22:384-94) provides a review of the association betweennumerous cancers (e.g., breast, bladder, gastric, lung, prostate, headand neck squamous cell, and nasopharyngeal) and aberrant methylation.Additionally or alternatively, additional types of nucleic acids orfeatures of nucleic acids have been associated with various cancers.Non-limiting examples of such nucleic acids or features of nucleic acidsinclude the presence or absence of various microRNAs (miRNAs) have beenused in the diagnosis of colon, prostate, colorectal, and ovariancancers (see, for example, D'Souza et al., 2018, PLos One, 13:e0194268;Fukagawa et al., 2017, Cancer Sci., 108:886-96; Giraldez et al., 2018,Methods Mol. Biol., 1768:459-74; U.S. Pat. Nos. 8,343,718; 9,410,956;and 9,074,206). For a review on the specific association of miR-22 withcancer, see Wang et al. (2017, Int. J. Oncol., 50:345-55); the abnormalexpression of long non-coding RNAs (lncRNAs) also have been used as abiomarker in cancers such as prostate cancer, colorectal cancer,cervical cancer, melanoma, non-small cell lung cancer, gastric cancer,endometrial carcinoma, and hepatocellular carcinoma (see, for example,Wang et al., 2017, Oncotarget, 8:58577086; Wang et al., 2018, Mol.Cancer, 17:110; Yu et al., 2018, Eur. Rev. Med. Pharmacol. Sci.,22:4812-9; Yu et al., 2018, Eur. Rev. Med. Pharmacol. Sci., 22:993-1002;Zhang et al., 2018, Eur. Rev. Med. Pharmacol. Sci., 22:4820-7; Zhang etal., 2018, Eur. Rev. Med. Pharmacol. Sci., 22:2304-9; Xie et al., 2018,EBioMedicine, 33:57-67; and U.S. Pat. No. 9,410,206); the presence orabsence of circular RNA (circRNA) has been used as a biomarker in lungcancer, breast cancer, gastric cancer, colorectal cancer, and livercancer (e.g., Geng et al., 2018, J. Hematol. Oncol., 11:98) and melanoma(e.g., Zhang et al., 2018, Oncol. Lett., 16:1219-25); changes intelomeric DNA (e.g., in length or in heterozygosity) or centromeric DNA(e.g., changes in expression of centromeric genes) also have beenassociated with cancers (e.g., prostate, breast, lung, lymphoma, andEwing's sarcoma) (see, for example, Baretton et al., 1994, Cancer Res.,54:4472-80; Liscia et al., 1999, Br. J. Cancer, 80:821-6; Proctor etal., 2009, Biochim. Biophys. Acta, 1792:260-74; and Sun et al., 2016,Int. J. Cancer, 139:899-907); various mutations (e.g., deletions),rearrangements and/or copy number changes in mitochondrial DNA (mtDNA)have been used prognostically and diagnostically for various cancers(e.g., prostate cancer, melanoma, breast cancer, lung cancer, andcolorectal cancer). See, for example, Maragh et al., 2015, CancerBiomark., 15:763-73; Shen et al., 2010, Mitochondrion, 10:62-68; Hosgoodet al., 2010, Carcinogen., 31:847-9; Thyagaraj an et al., 2012, CancerEpid. Biomarkers & Prev., 21:1574-81; and U.S. Pat. No. 9,745,632; andthe abnormal presence, absence or amount of messenger RNAs (mRNAs) alsohave been correlated with various cancers including, without limitation,breast cancer, Wilms' tumors, and cervical cancer (see, for example,Guetschow et al., 2012, Anal. Bioanaly. Chem., 404:399-406;Schwienbacher et al., 2000, Cancer Res., 60:1521-5; and Ngan et al.,1997, Genitourin Med., 73:54-8). Each of these citations is incorporatedherein by reference in its entirety.

This document provides methods and materials for assessing and/ortreating mammals (e.g., humans) having, or suspected of having, cancer.In some embodiments, this document provides methods and materials foridentifying a mammal as having cancer. For example, a sample (e.g., ablood sample) obtained from a mammal can be assessed to determine if themammal has cancer based, at least in part, on the presence or absence ofone or more first biomarkers (e.g., genetic biomarkers) and/or anelevated level of one or more second biomarkers (e.g., peptidebiomarkers) in the sample. A biomarker panel (e.g., a set of one or morebiomarkers) described herein can include the presence of two or more(e.g., three, five, nine, 10, 25, 100, 250, 500, 1000, 1500, 2000, 2500,or more) biomarkers (e.g., biomarkers associated with cancer). In someembodiments, a biomarker panel can include about 2,011 biomarkers (e.g.,about 2,001 genomic biomarkers and about 10 peptide biomarkers). In someembodiments, methods and materials described herein also can includeidentifying the location (e.g., the anatomic site) of a cancer in amammal. For example, a sample (e.g., a blood sample) obtained from amammal can be assessed to determine the location of the cancer in themammal based, at least in part, on the presence or absence of one ormore first biomarkers (e.g., genetic biomarkers) and/or an elevatedlevel of one or more second biomarkers (e.g., peptide biomarkers). Insome embodiments, methods and materials described herein also caninclude treating a mammal having cancer (e.g., administering one or morecancer treatments to treat the mammal). For example, a sample (e.g., ablood sample) obtained from a mammal can be assessed to determine if themammal has cancer based, at least in part, on the presence or absence ofone or more first biomarkers (e.g., genetic biomarkers) and/or anelevated level of one or more second biomarkers (e.g., peptidebiomarkers), and administering one or more cancer treatments to treatthe mammal (e.g., to reduce the severity of the cancer, to reduce asymptom of the cancer, and/or to reduce the number of cancer cellspresent within the mammal).

The term “elevated level” as used herein with respect to a level of apeptide biomarker refers to any level that is greater than the referencelevel of the peptide typically observed in a sample (e.g., a referencesample) from one or more healthy mammals. In some embodiments, areference sample can be a sample obtained from a mammal that does nothave a cancer. For example, for a peptide biomarker associated withcolorectal cancer, a reference sample can be a sample obtained from asubject that does not have colorectal cancer. In some embodiments, areference sample can be a sample obtained from the same mammal in whichthe elevated level of a peptide biomarker is observed, where thereference sample was obtained prior to onset of the cancer. In someembodiments, such a reference sample obtained from the same mammal isfrozen or otherwise preserved for future use as a reference sample. Insome embodiments, when reference samples have undetectable levels of apeptide biomarker, an elevated level can be any detectable level of thepeptide biomarker. It will be appreciated that levels from comparablesamples are used when determining whether or not a particular level isan elevated level.

Any appropriate mammal can be assessed and/or treated as describedherein. A mammal can be a mammal having cancer. A mammal can be a mammalsuspected of having cancer. In some embodiments, humans or otherprimates such as monkeys can be assessed for the presence or absence ofone or more first biomarkers (e.g., genetic biomarkers) and/or anelevated level of one or more second biomarkers (e.g., peptidebiomarkers) as described herein. In some embodiments, dogs, cats,horses, cows, pigs, sheep, mice, and rats can be assessed for thepresence or absence of one or more first biomarkers (e.g., geneticbiomarkers) and/or an elevated level of one or more second biomarkers(e.g., peptide biomarkers) as described herein. For example, a human canbe assessed for the presence or absence of one or more first biomarkers(e.g., genetic biomarkers) and/or an elevated level of one or moresecond biomarkers (e.g., peptide biomarkers) as described herein and,optionally, can be treated with one or more cancer treatments asdescribed herein.

Any appropriate sample from a mammal can be assessed as described herein(e.g., assessed for the presence or absence of one or more firstbiomarkers (e.g., genetic biomarkers) and/or an elevated level of one ormore second biomarkers (e.g., peptide biomarkers)). In some embodiments,a sample can include DNA (e.g., genomic DNA). In some embodiments, asample can include cell-free DNA (e.g., circulating tumor DNA (ctDNA)).In some embodiments, a sample can include peptides. For example, asample can include circulating peptides (e.g., cancer related peptides).As used herein a “circulating peptide” is a peptide that can be detectedin any closed system (e.g., the circulatory system) within the body of amammal. In some embodiments, a sample can be fluid sample (e.g., aliquid biopsy). Examples of samples that can contain DNA and/or peptidesinclude, without limitation, blood (e.g., whole blood, serum, orplasma), amnion, tissue, urine, cerebrospinal fluid, saliva, sputum,broncho-alveolar lavage, bile, lymphatic fluid, cyst fluid, stool,ascites, pap smears, breast milk, and exhaled breath condensate. Forexample, a plasma sample can be assessed for the presence or absence ofone or more first biomarkers (e.g., genetic biomarkers) and/or anelevated level of one or more second biomarkers (e.g., peptidebiomarkers) as described herein.

In some embodiments, a sample can be processed (e.g., to isolate and/orpurify DNA and/or peptides from the sample). For example, DNA isolationand/or purification can include cell lysis (e.g., using detergentsand/or surfactants), protein removal (e.g., using a protease), and/orRNA removal (e.g., using an RNase). As another example, peptideisolation and/or purification can include cell lysis (e.g., usingdetergents and/or surfactants), DNA removal (e.g., using a DNase),and/or RNA removal (e.g., using an RNase).

Any appropriate biomarkers can be used as described herein (e.g., todetermine if a mammal has cancer based, at least in part, on thepresence or absence of one or more first biomarkers (e.g., geneticbiomarkers) and/or an elevated level of one or more second biomarkers(e.g., peptide biomarkers) in the sample). Examples of biomarkersinclude, without limitation, genetic biomarkers, peptide biomarkers,metabolites, mRNA transcripts, miRNAs, methylation patterns (e.g., DNAmethylation patterns), proteins (e.g., antibodies), and chromatinpatterns. In some embodiments, the presence of one or more geneticbiomarkers can be used to identify a mammal as having cancer. In someembodiments, an elevated level one or more peptide biomarkers can beused to identify a mammal as having cancer. In some embodiments, thepresence of one or more genetic biomarkers and an elevated level of oneor more peptide biomarkers in combination can be used to identify amammal as having cancer. In some embodiments, detecting the presence ofone or more genetic biomarkers and an elevated level of one or morepeptide biomarkers in combination can increase the specificity and/orsensitivity of detection as compared to detecting either geneticbiomarkers or peptide biomarkers alone.

A genetic biomarker can be any appropriate genetic biomarker. Forexample, a genetic biomarker can be a genetic biomarker associated withcancer. A genetic biomarker can include a modification in a gene.Examples of modifications include, without limitation, single basesubstitutions, insertions, deletions, indels, translocations, and copynumber variations. A genetic biomarker can be in any appropriate gene.In some embodiments, a genetic biomarker can include a modification(e.g., an inactivating modification) in a tumor suppressor gene. In someembodiments, a genetic biomarker can include a modification (e.g., anactivating modification) in an oncogene. Examples of genes that caninclude a genetic biomarker include, without limitation, NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, GNAS, JUN, ABCA7, ACVR1B, ACVR2A, AJUBA, AKT1, ALB,ALDOB, ALK, AMBRA1, AMER1, AMOT, ANKRD46, APC, AR, ARHGAP35, ARID1A,ARID1B, ARID2, ARID4B, ARL15, ARMCX1, ASXL1, ASXL2, ATAD2, ATG14, ATG5,ATM, ATRX, ATXN2, AXIN1, B2M, BAP1, BCL9, BCLAF1, BCOR, BIRC6, BIRC8,BLVRA, BRAF, BRCA1, BRCA2, BRD7, BRE, BRWD3, BTBD7, BTRC, C11orf70,C12orf57, C2CD5, C3orf62, C8orf34, CAMKV, CAPG, CASP8, CBFB, CBX4,CCAR1, CCDC117, CCDC88A, CCM2, CCNC, CCND1, CCR3, CD1D, CD79B, CDC73,CDCP1, CDH1, CDK12, CDK4, CDKN1A, CDKN1B, CDKN2A, CEBPA, CELF1, CENPB,CEP128, CHD2, CHD4, CHD8, CHEK2, CHRDL1, CHUK, CIC, CLEC4C, CMTR2, CNN2,CNOT1, CNOT4, COL11A1, COPS4, COX7B2, CREBBP, CSDE1, CSMD3, CTCF,CTDNEP1, CTNNB1, CUL1, CUL2, CYB5B, DACH1, DCHS1, DCUN1D1, DDX3X, DDX5,DHX15, DHX16, DICER1, DIRC2, DIS3, DIXDC1, DKK2, DNAJB5, DNER, DNM1L,DNMT3A, EED, EGFR, EIF1AX, EIF2AK3, EIF2S2, EIF4A1, EIF4A2, ELF3, EMG1,EMR3, EP300, EPB41L4A, EPHA2, EPS8, ERBB2, ERBB3, ERRFI1, EXO5, EZH2,F5, FANCM, FAT1, FBN2, FBXW7, FCER1G, FGFR1, FGFR2, FGFR3, FLT3, FN1,FOXA1, FUBP1, FUS, GALNTL5, GATA3, GGCT, GIGYF2, GK2, GLIPR2, GNPTAB,GNRHR, GOLM1, GOT2, GPS2, GPX7, GRK1, GSE1, GZMA, HDAC1, HERC1, HERC4,HGF, HIST1H2BO, HLA-A, HLA-B, HMCN1, HNRNPA1, HRAS, HSP90AB1, ID3, IDH1,IDH2, IFNGR2, IFT88, IKZF2, INO80C, INPP4A, INPPL1, IWS1, JAK1, JAK2,KANSL1, KATE, KATNAL1, KBTBD7, KCNMB4, KDM5C, KDM6A, KEAP1, KIAA1467,KLF4, KMT2A, KMT2B, KMT2C, KMT2D, KMT2E, KRAS, KRT15, LAMTOR1, LARP4B,LPAR2, LYN, MAP2K1, MAP2K2, MAP2K4, MAP3K1, MAP4K3, MAPK1, MAX, MB21D2,MBD1, MBD6, MBNL1, MBNL3, MED12, MED23, MEN1, MGA, MKLN1, MLLT4, MOAP1,MORC4, MS4A1, MSI1, MTOR, MYC, MYCN, MYD88, MYL6, MYO1B, MYO6, NAA15,NAA25, NAP1L2, NAP1L4, NCOA2, NCOR1, NEK9, NF1, NF2, NFE2L2, NFE2L3,NIPBL, NIT1, NKX3-1, NME4, NOTCH1, NOTCH2, NPM1, NRAS, NSD1, PBRM1,PCBP1, PCOLCE2, PHF6, PIK3CA, PIK3CB, PIK3R1, POLA2, POT1, PPARD, PPM1D,PPP2R1A, PPP6C, PRKACA, PRKCI, PRPF40A, PSIP1, PTEN, PTH2, PTMS, PTN,PTPN11, RAB18, RAC1, RAF1, RANBP3L, RAPGEF6, RASA1, RB1, RBBP6, RBM10,RBM26, RC3H2, REL, RERE, RFC4, RHEB, RHOA, RIMS2, RIT1, RNF111, RNF43,RPL11, RPL5, RQCD1, RRAS2, RUNX1, RXRA, SARM1, SCAF11, SEC22A, SENP3,SENP8, SETD1B, SETD2, SF3A3, SF3B1, SFPQ, SIN3A, SKAP2, SMAD2, SMAD3,SMAD4, SMARCA4, SMARCB1, SMARCC2, SNCB, SOS1, SOX4, SOX9, SP3, SPEN,SPOP, SPSB2, STAG2, STK11, STK31, SUFU, TAF1A, TARDBP, TAS2R30, TBL1XR1,TBX3, TCF12, TCF7L2, TET2, TEX11, TFDP2, TGFBR2, THRAP3, TM9SF1, TMCO2,TMED10, TMEM107, TMEM30A, TMPO, TNFRSF9, TNRC6B, TP53, TP53BP1, TRAF3,TRIMS, TRIP12, TSC1, TTK, TTR, TUBA3C, U2AF1, UBE2D3, UBR5, UNC13C,UNKL, UPP1, USO1, USP28, USP9X, VHL, VN1R2, VPS33B, WAC, WDR33, WDR47,WT1, WWP1, XPO1, YOD1, ZC3H13, ZDHHC4, ZFHX3, ZFP36L1, ZFP36L2, ZGRF1,ZMYM3, ZMYM4, ZNF234, ZNF268, ZNF292, ZNF318, ZNF345, ZNF600, ZNF750,and ZNF800. For example, a genetic biomarker can be in one or more ofNRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS,KRAS, AKT1, TP53, PPP2R1A, and/or GNAS. In some embodiments, methods andmaterials described herein can include detecting one or more geneticbiomarkers (e.g., one or more modifications in one or more genes). Forexample, methods and materials described herein can include detectingmutations in one or more genes encoding any of the proteins set forth inExample 1, or in one or more of the genes set forth in Table 3 or Table5. In some embodiments, methods and materials described herein caninclude detecting one or more of the modifications set forth in Table 3or Table 5. In some embodiments, methods and materials described hereincan include detecting the presence or absence of about 2,001modifications in about 16 genes. For example, methods and materialsdescribed herein can include detecting the presence or absence about2,001 genetic biomarkers in one or more of NRAS, CTNNB1, PIK3CA, FBXW7,APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A,and GNAS. In some embodiments, genetic biomarkers can be as describedelsewhere (see, e.g., Bettegowda et al., 2014 Science translationalmedicine 6:224ra224; Haber et al., 2014 Cancer Discov 4: 650-661; Dawsonet al., 2013 N Engl J Med 368:1199-1209; Wang et al., 2015 Sciencetranslational medicine 7:293ra104; Forshew et al., 2012 Sciencetranslational medicine 4:136ra168; Abbosh et al., 2017 Nature545:446-451; Beddowes et al., 2017 Breast 34(Suppl 1):S31-S35; andPhallen et al., 2017 Science translational medicine 9).

Any appropriate method can be used to detect the presence or absence ofone or more biomarkers (e.g., genetic biomarkers) as described herein.In some embodiments, one or more genetic biomarkers can be detectedindependently (e.g., via singleplex peptide tools). In some embodiments,one or more genetic biomarkers can be detected simultaneously (e.g., viamultiplex DNA tools such as “chips” or microarrays). Examples of methodsfor detecting genetic biomarkers include, without limitation, sequencing(e.g., PCR-based sequencing such as multiplex PCR-based sequencing), DNAhybridization methods (e.g., Southern blotting), restriction enzymedigestion methods, PCR-based multiplex methods, digital PCR methods,droplet digital PCR (ddPCR) methods, PCR-based singleplex PCR methods,Sanger sequencing methods, next-generation sequencing methods (e.g.,single-molecule real-time sequencing, nanopore sequencing, and Polonysequencing), quantitative PCR methods, ligation methods, and microarraymethods. In some embodiments, methods and materials described herein caninclude multiplex PCR-based sequencing. For example, methods andmaterials described herein can include multiplex PCR-based sequencing asset forth in Example 1. In some embodiments of methods provided herein,the presence of one or more mutations present in a sample obtained froma subject is detected using a method is performed that can increase thesensitivity of massively parallel sequencing instruments with an errorreduction technique. For example, such techniques can permit thedetection of rare mutant alleles in a range of 1 mutant template among5,000 to 1,000,000 wild-type templates. In some embodiments, thepresence of one or more mutations present in a sample obtained from asubject is detected by amplifying DNA (e.g., DNA obtained from cells ina sample or cell-free DNA) from regions of interest (e.g., regionsincluding one or more genetic biomarkers) to form families of ampliconsin which each member of a family is derived from a single templatemolecule (e.g., a single region of interest) in the cell-free DNA,wherein each member of a family is marked by a common oligonucleotidebarcode, and wherein each family is marked by a distinct oligonucleotidebarcode. For example, the presence of one or more mutations present in asample obtained from a subject can be detected by assigning a uniqueidentifier (UID) to each template molecule, amplifying each uniquelytagged template molecule to create UID-families, and redundantlysequencing the amplification products. In some embodiments, theoligonucleotide barcode is introduced into the template molecule by astep of amplifying with a population of primers that collectivelycontain a plurality of oligonucleotide barcodes. In some embodiments,the oligonucleotide barcode is endogenous to the template molecule, andan adapter comprising a DNA synthesis priming site is ligated to an endof the template molecule adjacent to the oligonucleotide barcode. See,e.g., Kinde et al., 2011 Proc Natl Acad Sci USA 108:9530-9535.

In some embodiments of methods provided herein, the presence of one ormore mutations present in a sample obtained from a subject is detectedusing sequencing technology (e.g., a next-generation sequencingtechnology). A variety of sequencing technologies are known in the art.For example, methods for detection and characterization of circulatingtumor DNA in cell-free DNA can be described elsewhere (see, e.g., Haberand Velculescu, 2014 Cancer Discov 4:650-61). Non-limiting examples ofsuch techniques include SafeSeqs (see, e.g., Kinde et al., 2011 ProcNatl Acad Sci USA; 108:9530-5), OnTarget (see, e.g., Forshew et al.,2012 Sci Transl Med; 4:136ra68,), and TamSeq (see, e.g., Thompson etal., 2012 PLoS ONE, 7:e31597). In some embodiments, the presence of oneor more mutations present in a sample obtained from a subject isdetected using droplet digital PCR (ddPCR), a method that is known to behighly sensitive for mutation detection. In some embodiments, thepresence of one or more mutations present in a sample obtained from asubject is detected using other sequencing technologies, including butnot limited to, chain-termination techniques, shotgun techniques,sequencing-by-synthesis methods, methods that utilize microfluidics,other capture technologies, or any of the other sequencing techniquesknown in the art that are useful for detection of small amounts of DNAin a sample (e.g., ctDNA in a cell-free DNA sample).

In some embodiments, the presence of one or more mutations present in asample obtained from a subject is detected using array-based methods.For example, the step of detecting a genetic alteration (e.g., one ormore genetic alterations) in cell-free DNA is performed using a DNAmicroarray. In some embodiments, a DNA microarray can detect one more ofa plurality of cancer cell mutations. In some embodiments, cell-free DNAis amplified prior to detecting the genetic alteration. Non-limitingexamples of array-based methods that can be used in any of the methodsdescribed herein, include: a complementary DNA (cDNA) microarray (see,e.g., Kumar et al. 2012 J. Pharm. Bioallied Sci. 4(1):21-26; Laere etal. 2009 Methods Mol. Biol. 512:71-98; Mackay et al. 2003 Oncogene22:2680-2688; Alizadeh et al. 1996 Nat. Genet. 14:457-460), anoligonucleotide microarray (see, e.g., Kim et al. 2006 Carcinogenesis27(3):392-404; Lodes et al. 2009 PLoS One 4(7):e6229), a bacterialartificial chromosome (BAC) clone chip (see, e.g., Chung et al. 2004Genome Res. 14(1):188-196; Thomas et al. 2005 Genome Res.15(12):1831-1837), a single-nucleotide polymorphism (SNP) microarray(see, e.g., Mao et al. 2007 Curr. Genomics 8(4):219-228; Jasmine et al.2012 PLoS One 7(2):e31968), a microarray-based comparative genomichybridization array (array-CGH) (see, e.g., Beers and Nederlof, 2006Breast Cancer Res. 8(3):210; Pinkel et al. 2005 Nat. Genetics37:S11-S17; Michels et al. 2007 Genet. Med. 9:574-584), a molecularinversion probe (MIP) assay (see, e.g., Wang et al. 2012 Cancer Genet205(7-8):341-55; Lin et al. 2010 BMC Genomics 11:712). In someembodiments, the cDNA microarray is an Affymetrix microarray (see, e.g.,Irizarry 2003 Nucleic Acids Res 31:e15; Dalma-Weiszhausz et al. 2006Methods Enzymol. 410:3-28), a NimbleGen microarray (see, e.g., Wei etal. 2008 Nucleic Acids Res 36(9):2926-2938; Albert et al. 2007 Nat.Methods 4:903-905), an Agilent microarray (see, e.g., Hughes et al. 2001Nat. Biotechnol. 19(4):342-347), or a BeadArray array (see, e.g., Liu etal. 2017 Biosens Bioelectron 92:596-601). In some embodiments, theoligonucleotide microarray is a DNA tiling array (see, e.g., Mockler andEcker, 2005 Genomics 85(1):1-15; Bertone et al. 2006 Genome Res16(2):271-281). Other suitable array-based methods are known in the art.

In some embodiments, multiplex PCR-based sequencing can include a numberof amplicons that provides improved sensitivity of detection of one ormore genetic biomarkers. For example, multiplex PCR-based sequencing caninclude about 60 amplicons (e.g., 50, 51, 52, 53, 54, 55, 56, 57, 58,59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, or 70 amplicons). In someembodiments, multiplex PCR-based sequencing can include 61 amplicons. Anamplicon can be any appropriate size (e.g., can include any appropriatenumber of nucleotides). In some embodiments, an amplicon can include nomore than 1000 (e.g., about 50, about 55, about 60, about 65, about 70,about 75, about 80, about 85, about 90, about 95, about 100, about 200,about 300, about 400, about 500, about 600, about 700, about 800, orabout 900) nucleotides. In some embodiments, an amplicon can include atleast 6 (e.g., about 6, about 10, about 15, about 20, about 25, about30, about 35, about 40, about 45, or about 50) nucleotides. Ampliconsproduced using multiplex PCR-based sequencing can include nucleic acidshaving a length from about 15 bp to about 1000 bp (e.g., from about 6 pbto about 800 bp, from about 10 bp to about 700 bp, from about 15 bp toabout 600 bp, from about 20 bp to about 600 bp, from about 25 bp toabout 500 bp, from about 30 bp to about 400 bp, a from about 35 bp toabout 300 bp, from about 40 bp to about 200 bp, from about 45 bp toabout 100 bp, from about 50 bp to about 95 bp, from about 55 bp to about90, or from about 66 to about 80, from about 25 bp to about 1000 bp,from about 35 bp to about 1000 bp, from about 50 bp to about 1000 bp,from about 100 bp to about 1000 bp, from about 250 bp to about 1000 bp,from about 500 bp to about 1000 bp, from about 750 bp to about 1000 bp,from about 15 bp to about 750 bp, from about 15 bp to about 500 bp, fromabout 15 bp to about 300 bp, from about 15 bp to about 200 bp, fromabout 15 bp to about 100 bp, from about 15 bp to about 80 bp, from about15 bp to about 75 bp, from about 15 bp to about 50 bp, from about 15 bpto about 40 bp, from about 15 bp to about 30 bp, from about 15 bp toabout 20 bp, from about 20 bp to about 100 bp, from about 25 bp to about50 bp, or from about 30 bp to about 40 bp). For example, ampliconsproduced using multiplex PCR-based sequencing can include nucleic acidshaving a length of about 33 bp.

A peptide biomarker can be any appropriate peptide biomarker. In someembodiments, a peptide biomarker can be a peptide biomarker associatedwith cancer. For example, a peptide biomarker can be a peptide havingelevated levels in a cancer (e.g., as compared to a reference level ofthe peptide). Examples of peptide biomarkers include, withoutlimitation, AFP, Angiopoietin-2, AXL, CA125, CA 15-3, CA19-9, CD44, CEA,CYFRA21-1, DKK1, Endoglin, FGF2, Follistatin, Galectin-3, G-CSF, GDF15,HE4, HGF, IL-6, IL-8, Kallikrein-6, Leptin, LRG-1, Mesothelin, Midkine,Myeloperoxidase, NSE, OPG, OPN, PAR, Prolactin, sEGFR, sFas, SHBG,sHER2/sEGFR2/sErbB2, sPECAM-1, TGFa, Thrombospondin-2, TIMP-1, TIMP-2,and Vitronectin. For example, a peptide biomarker can include one ormore of OPN, IL-6, CEA, CA125, HGF, Myeloperoxidase, CA19-9, Midkineand/or TIMP-1. In some embodiments, methods and materials describedherein can include one or more peptide biomarkers (e.g., one or morepeptides have elevated levels in a cancer). For example, methods andmaterials described herein can include one or more of the peptidebiomarkers set forth in Example 1. For example, methods and materialsdescribed herein can include elevated levels one or more of the peptidebiomarkers set forth in Table 4. In some embodiments, methods andmaterials described herein can include detecting the levels of about 10peptides. For example, methods and materials described herein caninclude detecting the level of OPN, IL-6, CEA, CA125, HGF,Myeloperoxidase, CA19-9, Midkine, and/or TIMP-1. In some embodiments,peptide biomarkers can be as described elsewhere (see, e.g., Liotta etal., 2003 Clin Adv Hematol Oncol 1:460-462; Wang et al., 2016 Expert RevProteomics 13: 99-114; and Patz, Jr. et al., 2007 J Clin Oncol25:5578-5583).

Any appropriate method can be used to detect the level (e.g., anelevated level) of one or more biomarkers (e.g., peptide biomarkers) asdescribed herein. In some embodiments, the levels of one or more peptidebiomarkers can be detected independently (e.g., via singleplex peptidetools). In some embodiments, the levels of one or more peptidebiomarkers can be detected simultaneously (e.g., via multiplex peptidetools such as “chips” or microarrays). Examples of methods for detectingpeptide levels include, without limitation, spectrometry methods (e.g.,high-performance liquid chromatography (HPLC) and liquidchromatography-mass spectrometry (LC/MS)), antibody dependent methods(e.g., enzyme-linked immunosorbent assay (ELISA), proteinimmunoprecipitation, immunoelectrophoresis, western blotting, andprotein immunostaining), and aptamer dependent methods. In someembodiments, the level of one or more peptide biomarkers can be detectedas described in the Examples. For example, the level of one or morepeptide biomarkers can be detected by multiplex immunoassay.

Any appropriate cancer can be identified and/or treated as describedherein. In some embodiments, a cancer can be a common cancer. In someembodiments, a cancer can be a cancer where no blood-based test isavailable. In some embodiments, a cancer can be a cancer where no testfor early detection is available. In some embodiments, a cancer can be aStage I cancer. In some embodiments, a cancer can be a Stage II cancer.In some embodiments, a cancer can be a Stage III cancer. In someembodiments, a cancer can be a Stage IV cancer. In some embodiments, acancer can be a surgically resectable cancer. Examples of cancers thatbe identified as described herein (e.g., based at least in part on thepresence or absence of one or more first biomarkers (e.g., geneticbiomarkers) and/or an elevated level of one or more second biomarkers(e.g., peptide biomarkers)) include, without limitation, liver cancer,ovarian cancer, esophageal cancer, stomach cancer, pancreatic cancer,colorectal cancer, lung cancer, breast cancer, and prostate cancer.

Methods and materials provided herein also can identify the location ofcancer (e.g., can determine the cancer site and/or type) in a mammal.The location of any cancer (e.g., the cancer site and/or type) that hasmutations in one or more genetic biomarkers and/or one or more peptidebiomarkers as described herein can be determined. In some embodiments,materials and methods provided herein can identify the presence of acolorectal cancer. For example, the presence of one or more geneticbiomarkers in one or more of APC, KRAS, and/or TP53 gene mutations andan elevated level of CEA in a sample obtained from a mammal can be usedto identify the presence of a colorectal cancer in the mammal. In someembodiments, materials and methods provided herein can identify thepresence of a liver cancer. For example, the presence of one or moregenetic biomarkers in one or more of TP53, CTNNB1, and/or TERT and anelevated level of AFP in a sample obtained from a mammal can be used toidentify the presence of a liver cancer in the mammal. In someembodiments, materials and methods provided herein can identify thepresence of an ovarian cancer. For example, the presence of one or moregenetic biomarkers in TP53 and an elevated level CA125 in a sampleobtained from a mammal can be used to identify the presence of anovarian cancer in the mammal. In some embodiments, materials and methodsprovided herein can identify the presence of a pancreatic cancer. Forexample, the presence of one or more genetic biomarkers in KRAS (e.g.,KRAS codon 12) and an elevated level of CA19-9 in a sample obtained froma mammal can be used to identify the presence of a pancreatic cancer inthe mammal.

In some embodiments, a mammal identified as having cancer as describedherein (e.g., based at least in part on the presence or absence of oneor more first biomarkers (e.g., genetic biomarkers) and/or an elevatedlevel of one or more second biomarkers (e.g., peptide biomarkers)) canhave the cancer diagnosis confirmed using any appropriate method.Examples of methods that can be used to diagnose or confirm diagnosis ofa cancer include, without limitation, physical examinations (e.g.,pelvic examination), imaging tests (e.g., ultrasound or CT scans),cytology, and tissue tests (e.g., biopsy).

In some embodiments, any of the variety of methods disclosed herein canbe performed on subjects who have previously undergone treatments forcancer. In some embodiments, methods provided herein can be used todetermine the efficacy of the treatment. For example, a subject havingcancer can be administered a treatment (also referred to herein as a“therapeutic intervention”), after which the continued presence ofcancer or the amount of cancer (or lack thereof) is determined bydetecting the presence of one or more mutations in one or more genes(e.g., NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN,FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS) and/or elevatedlevels of one or more peptide biomarkers (e.g., OPN, IL-6, CEA, CA125,HGF, Myeloperoxidase, CA19-9, Midkine, and/or TIMP-1).

In some embodiments, once a subject has been determined to have acancer, the subject may be additionally monitored or selected forincreased monitoring. In some embodiments, methods provided herein canbe used to select a subject for increased monitoring at a time periodprior to the time period when conventional techniques are capable ofdiagnosing the subject with an early-stage cancer. For example, methodsprovided herein for selecting a subject for increased monitoring can beused when a subject has not been diagnosed with cancer by conventionalmethods and/or when a subject is not known to harbor a cancer. In someembodiments, a subject selected for increased monitoring can beadministered a diagnostic test (e.g., any of the diagnostic testsdisclosed herein) at an increased frequency compared to a subject thathas not been selected for increased monitoring. For example, a subjectselected for increased monitoring can be administered a diagnostic testat a frequency of twice daily, daily, bi-weekly, weekly, bi-monthly,monthly, quarterly, semi-annually, annually, or any at frequencytherein. In some embodiments, a subject selected for increasedmonitoring can be administered a one or more additional diagnostic testscompared to a subject that has not been selected for increasedmonitoring. For example, a subject selected for increased monitoring canbe administered two diagnostic tests, whereas a subject that has notbeen selected for increased monitoring is administered only a singlediagnostic test (or no diagnostic tests). In some embodiments, a subjectthat has been selected for increased monitoring can also be selected forfurther diagnostic testing. Once the presence of a cancer cell has beenidentified (e.g., by any of the variety of methods disclosed herein), itmay be beneficial for the subject to undergo both increased monitoring(e.g., to assess the progression of the tumor or cancer in the subjectand/or to assess the development of additional cancer cell mutations),and further diagnostic testing (e.g., to determine the size and/or exactlocation of the tumor harboring the cancer cell). In some embodiments, atherapeutic intervention is administered to the subject that is selectedfor increased monitoring after a cancer cell mutation is detected. Anyof the therapeutic interventions disclosed herein or known in the artcan be administered. For example, a subject that has been selected forincreased monitoring can be further monitored, and a therapeuticintervention can be administered if the presence of the cancer cell ismaintained throughout the increased monitoring period. Additionally oralternatively, a subject that has been selected for increased monitoringcan be administered a therapeutic intervention, and further monitored asthe therapeutic intervention progresses. In some embodiments, after asubject that has been selected for increased monitoring has beenadministered a therapeutic intervention, the increased monitoring willreveal one or more additional cancer cell mutations. In someembodiments, such one or more additional cancer cell mutations willprovide cause to administer a different therapeutic intervention (e.g.,a resistance mutation may arise in a cancer cell during the therapeuticintervention, which cancer cell harboring the resistance mutation isresistance to the original therapeutic intervention).

In some embodiments, once a subject has been determined to have acancer, the subject may be administered further tests or selected forfurther diagnostic testing. In some embodiments, methods provided hereincan be used to select a subject for further diagnostic testing at a timeperiod prior to the time period when conventional techniques are capableof diagnosing the subject with an early-stage cancer. For example,methods provided herein for selecting a subject for further diagnostictesting can be used when a subject has not been diagnosed with cancer byconventional methods and/or when a subject is not known to harbor acancer. In some embodiments, a subject selected for further diagnostictesting can be administered a diagnostic test (e.g., any of thediagnostic tests disclosed herein) at an increased frequency compared toa subject that has not been selected for further diagnostic testing. Forexample, a subject selected for further diagnostic testing can beadministered a diagnostic test at a frequency of twice daily, daily,bi-weekly, weekly, bi-monthly, monthly, quarterly, semi-annually,annually, or any at frequency therein. In some embodiments, a subjectselected for further diagnostic testing can be administered a one ormore additional diagnostic tests compared to a subject that has not beenselected for further diagnostic testing. For example, a subject selectedfor further diagnostic testing can be administered two diagnostic tests,whereas a subject that has not been selected for further diagnostictesting is administered only a single diagnostic test (or no diagnostictests). In some embodiments, the diagnostic testing method can determinethe presence of the same type of cancer as the cancer that was originaldetected. Additionally or alternatively, the diagnostic testing methodcan determine the presence of a different type of cancer as the cancerthat was original detected. In some embodiments, the diagnostic testingmethod is a scan. In some embodiments, the scan is a computed tomography(CT), a CT angiography (CTA), a esophagram (a Barium swallom), a Bariumenema, a magnetic resonance imaging (MM), a PET scan, an ultrasound(e.g., an endobronchial ultrasound, an endoscopic ultrasound), an X-ray,a DEXA scan. In some embodiments, the diagnostic testing method is aphysical examination, such as an anoscopy, a bronchoscopy (e.g., anautofluorescence bronchoscopy, a white-light bronchoscopy, anavigational bronchoscopy), a colonoscopy, a digital breasttomosynthesis, an endoscopic retrograde cholangiopancreatography (ERCP),an ensophagogastroduodenoscopy, a mammography, a Pap smear, a pelvicexam, a positron emission tomography and computed tomography (PET-CT)scan. In some embodiments, a subject that has been selected for furtherdiagnostic testing can also be selected for increased monitoring. Oncethe presence of a cancer cell has been identified (e.g., by any of thevariety of methods disclosed herein), it may be beneficial for thesubject to undergo both increased monitoring (e.g., to assess theprogression of the tumor or cancer in the subject and/or to assess thedevelopment of additional cancer cell mutations), and further diagnostictesting (e.g., to determine the size and/or exact location of the tumorharboring the cancer cell). In some embodiments, a therapeuticintervention is administered to the subject that is selected for furtherdiagnostic testing after a cancer cell mutation is detected. Any of thetherapeutic interventions disclosed herein or known in the art can beadministered. For example, a subject that has been selected for furtherdiagnostic testing can be administered a further diagnostic test, and atherapeutic intervention can be administered if the presence of thecancer cell is confirmed. Additionally or alternatively, a subject thathas been selected for further diagnostic testing can be administered atherapeutic intervention, and can be further monitored as thetherapeutic intervention progresses. In some embodiments, after asubject that has been selected for further diagnostic testing has beenadministered a therapeutic intervention, the additional testing willreveal one or more additional cancer cell mutations. In someembodiments, such one or more additional cancer cell mutations willprovide cause to administer a different therapeutic intervention (e.g.,a resistance mutation may arise in a cancer cell during the therapeuticintervention, which cancer cell harboring the resistance mutation isresistance to the original therapeutic intervention).

Once identified as having a cancer as described herein (e.g., based atleast in part on the presence or absence of one or more first biomarkers(e.g., genetic biomarkers) and/or an elevated level of one or moresecond biomarkers (e.g., peptide biomarkers) and/or the presence ofaneuploidy), a mammal can be treated with one or more cancer treatments(also referred to herein as “therapeutic interventions”).

In certain aspects, provided herein are state-of-the-art tests that candetect mutations in cancer cells that are released into the bloodstream. In some embodiments, one or more cancer types can be detected.Assays as described herein can be used as a cancer screening test withimproved sensitivity while retaining specificity. For example, suchassays can combine detection of genetic biomarkers (e.g., mutations incirculating tumor DNA (ctDNA)) with detection of thresholded proteinmarkers in plasma. In some embodiments, a genetic biomarker (e.g., amutation in circulating tumor DNA (ctDNA)) is tested alone. In someembodiments, protein biomarkers are tested alone. In some embodiments,the combination of the genetic biomarker (e.g., a mutation incirculating tumor DNA (ctDNA)) and protein markers can be superior toany single marker. In an exemplary pilot study of 1,703 patients (1,240cancer and 463 healthy controls), the ctDNA and protein biomarkers panelhad a sensitivity of 64% and a specificity of 99.35%.

In some embodiments, assays as described herein may be applied toapparently healthy individuals. For example, assays as described hereinmay reduce deaths and suffering from cancer by detecting pre-symptomaticcancers through a blood test taken during routine office visits tophysicians. Additionally, assays as described herein may be applied topatients with localized cancers, particularly those that have been orcan be treated or resected. Assays as described herein may improvemanagement and prognosis though the earlier detection of recurrence.

In some embodiments, genetic biomarkers (e.g., mutations in cell-freeDNA (e.g., ctDNA)) may be tested from any of a variety of biologicalsamples obtained from a subject (e.g., a human subject) including, butnot limited to blood, plasma, serum, urine, cerebrospinal fluid, saliva,sputum, broncho-alveolar lavage, bile, lymphatic fluid, cyst fluid,stool, ascites, and combinations thereof.

In some embodiments, genetic biomarkers (e.g., mutations in ctDNA) in 10exemplary genes (AKT1, APC, BRAF, CDKN2A, CTNNB1, FBXW7, FGF2, GNAS,HRAS, KRAS) and elevation of 11 exemplary protein markers in serumbeyond a threshold (CA19-9 (>92 U/ml), CEA (>7,507 pg/ml), CA125 (>577U/ml), AFP (>21,321 pg/ml), Prolactin (>145,345 pg/ml), HGF (>899pg/ml), OPN (>157,772 pg/ml), TIMP-1 (>176,989 pg/ml), Follistatin(>1,970 pg/ml), G-CSF (>800 pg/ml), and CA15-3 (>98 U/ml)) may be testedin the assay. In some embodiments, genetic biomarkers (e.g., mutationsin ctDNA) in 16 exemplary genes (KT1, APC, BRAF, CDKN2, CTNNB1, FBXW7,FGFR2, GNAS, HRAS, KRAS, PPP2R1A, TP53, PTEN, PIK3CA, EGFR and NRAS) andelevation of 11 exemplary protein biomarkers in serum beyond a threshold(CA19-9 (>92 U/ml), CEA (>7,507 pg/ml), CA125 (>577 U/ml), AFP (>21,321pg/ml), Prolactin (>145,345 pg/ml), HGF (>899 pg/ml), OPN (>157,772pg/ml), TIMP-1 (>176,989 pg/ml), Follistatin (>1,970 pg/ml), G-CSF (>800pg/ml), and CA15-3 (>98 U/ml)) may be tested in the assay. In someembodiments, the presence of a genetic biomarker (e.g., a mutation) oran elevation beyond threshold of any one of the protein biomarkersconstitutes a positive result (e.g., identification of cancer in asubject). In some embodiments, the presence of genetic biomarkers (e.g.,mutations) or elevations in two or more protein biomarkers constitute apositive result. For example, the presence of genetic biomarkers (e.g.,mutations) in two, three, four, five, six, seven, eight, nine, or tenexemplary genes and/or elevations in two, three, four, five, six, seven,eight, nine, ten, or eleven protein biomarkers constitute a positiveresult.

In some embodiments, proteins (e.g., protein biomarkers) may be testedfrom any of a variety of biological samples obtained from a subject(e.g., a human subject) including, but not limited to blood, plasma,serum, urine, cerebrospinal fluid, saliva, sputum, broncho-alveolarlavage, bile, lymphatic fluid, cyst fluid, stool, ascites, andcombinations thereof. Proteins (e.g., protein biomarkers) that are foundin high amounts in cancers can be tested for amounts of the proteinsthat do not occur in healthy human subjects. Examples of proteins (e.g.,protein biomarkers), any one, two, three, four, five, six, seven, eight,nine, ten, or eleven of which may be tested, include, withoutlimitation, carbohydrate antigen 19-9 (CA19-9), carcinoembryonic antigen(CEA), hepatocyte growth factor (HGF), osteopontin (OPN), CA125, AFP,prolactin, TIMP-1, follistatin, G-CSF, and CA15-3. Any protein biomarkerknown in the art may be used when a threshold value is obtained abovewhich normal, healthy human subjects do not fall, but human subjectswith cancer do fall.

In some embodiments, a threshold level of CA19-9 can be at least about92 U/mL (e.g., about 92 U/mL). In some embodiments, a threshold level ofCA19-9 can be 92 U/mL. In some embodiments, a threshold level of CEA canbe at least about 7,507 pg/ml (e.g., about 7,507 pg/ml). In someembodiments, a threshold level of CEA can be 7.5 ng/mL. In someembodiments, a threshold level of HGF can be at least about 899 pg/ml(e.g., about 899 pg/ml). In some embodiments, a threshold level of HGFcan be 0.92 ng/mL. In some embodiments, a threshold level of OPN can beat least about 157,772 pg/ml (e.g., about 157,772 pg/ml). In someembodiments, a threshold level of OPN can be 158 ng/mL. In someembodiments, a threshold level of CA125 can be at least about 577 U/ml(e.g., about 577 U/ml). In some embodiments, a threshold level of CA125can be 577 U/mL. In some embodiments, a threshold level of AFP can be atleast about 21,321 pg/ml (e.g., about 21,321 pg/ml). In someembodiments, a threshold level of AFP can be 21,321 pg/ml. In someembodiments, a threshold level of prolactin can be at least about145,345 pg/ml (e.g., about 145,345 pg/ml). In some embodiments, athreshold level of prolactin can be 145,345 pg/ml. In some embodiments,a threshold level of TIMP-1 can be at least about 176,989 pg/ml (e.g.,about 176,989 pg/ml). In some embodiments, a threshold level of TIMP-1can be 176,989 pg/ml. In some embodiments, a threshold level offollistatin can be at least about 1,970 pg/ml (e.g., about 1,970 pg/ml).In some embodiments, a threshold level of follistatin can be 1,970pg/ml. In some embodiments, a threshold level of G-CSF can be at leastabout 800 pg/ml (e.g., about 800 pg/ml). In some embodiments, athreshold level of G-CSF can be 800 pg/ml. In some embodiments, athreshold level of CA15-3 can be at least about 98 U/ml (e.g., about 98U/ml). In some embodiments, a threshold level of CA15-3 can be 98 U/ml.In some embodiments, a threshold level of CA19-9, CEA, and/or OPN can be5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%,75%, 80%, 85%, 90%, 95%, 100% or more greater than the threshold levelslisted above (e.g., greater than a threshold level of 92 U/mL forCA-19-9, 7,507 pg/ml for CEA, 899 pg/ml for HGF, 157,772 pg/ml for OPN,577 U/ml for CA125, 21,321 pg/ml for AFP, 145,345 pg/ml for prolactin,176,989 pg/ml for TIMP-1, 1,970 pg/ml for follistatin, 800 pg/ml forG-CSF, and/or 98 U/ml for CA15-3).

In some embodiments, a threshold level of protein biomarker can begreater than the levels that are typically tested for diagnostic orclinical purposes. For example, the threshold level of CA19-9 can begreater than about 37 U/ml (e.g., greater than about 40, 45, 50, 55, 60,65, 70, 75, 80, 85, 90, 95 or more U/mL). Additionally or alternatively,the threshold level of CEA can be greater than about 2.5 ug/L (e.g.,greater than about 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5 ormore ug/L). Additionally or alternatively, the threshold level of CA125can be greater than about 35 U/mL (e.g., greater than about 40, 45, 50,55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350, 400,450, 500, 550 or more U/mL). Additionally or alternatively, thethreshold level of AFP can be greater than about 21 ng/mL (e.g., greaterthan about 25, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350,400 or more ng/L). Additionally or alternatively, the threshold level ofTIMP-1 can be greater than about 2300 ng/mL (e.g., greater than about2,500, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000,20,000, 25,000, 30,000, 35,000, 40,000 or more ng/L). Additionally oralternatively, the threshold level of follistatin can be greater thanabout 2 ug/mL (e.g., greater than about 2.5, 3.0, 3.5, 4.0, 4.5, 5.0,5.5, 6.0, 6.5, 7.0, 7.5 or more ug/L). Additionally or alternatively,the threshold level of CA15-3 can be greater than about 30 U/mL (e.g.,greater than about 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 ormore U/mL). In some embodiments, detecting one or more proteinbiomarkers at threshold levels that are higher than are typically testedfor during traditional diagnostic or clinical assays can improve thesensitivity of cancer detection.

In some embodiments, an assay includes detection of thresholded proteinbiomarkers in a biological sample (e.g., any biological sample disclosedherein such as plasma) without detection of genetic biomarkers (e.g.,mutations in circulating tumor DNA (ctDNA)). For example, an assay mayinclude detection of one or more of CA19-9, CEA, HGF, OPN, CA125, AFP,prolactin, TIMP-1, follistatin, G-CSF, and/or CA15-3 in a biologicalsample. In some embodiments, an assay may include detection of one ormore of CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1,follistatin, G-CSF, and/or CA15-3 in a biological sample at any of thethreshold levels disclosed herein. In some embodiments, once an assaythat includes detection of thresholded protein biomarkers in abiological sample is performed, subsequent testing or monitoring isperformed (e.g., any of the variety of further diagnostic testing orincreased monitoring techniques disclosed herein). In some embodiments,once an assay that includes detection of thresholded protein markers ina biological sample is performed, a second assay that includes detectinga genetic biomarker (e.g., a genetic biomarker present in cell-free DNA(e.g., ctDNA)) can be performed (e.g., detecting any of the variety ofgenetic alterations in genetic biomarkers that are present in cell-freeDNA or ctDNA as described herein).

In some embodiments, an assay includes detection of a genetic biomarkerin circulating tumor DNA (ctDNA) in a biological sample (e.g., anybiological sample disclosed herein such as plasma) without detection ofthresholded protein biomarkers. For example, an assay may includedetection of genetic biomarkers (e.g., genetic alterations) in one ormore of any of the genes disclosed herein including, without limitation,CDKN2A, FGF2, GNAS, ABL1, EVI1, MYC, APC, IL2, TNFAIP3, ABL2, EWSR1,MYCL1, ARHGEF12, JAK2, TP53, AKT1, FEV, MYCN, ATM, MAP2K4, TSC1, AKT2,FGFR1, NCOA4, BCL11B, MDM4, TSC2, ATF1, FGFR1OP, NFKB2, BLM, MEN1, VHL,BCL11A, FGFR2, NRAS, BMPR1A, MLH1, WRN, BCL2, FUS, NTRK1, BRCA1, MSH2,WT1, BCL3, GOLGA5, NUP214, BRCA2, NF1, BCL6, GOPC, PAX8, CARS, NF2, BCR,HMGA1, PDGFB, CBFA2T3, NOTCH1, BRAF, HMGA2, PIK3CA, CDH1, NPM1, CARD11,HRAS, PIM1, CDH11, NR4A3, CBLB, IRF4, PLAG1, CDK6, NUP98, CBLC, JUN,PPARG, SMAD4, PALB2, CCND1, KIT, PTPN11, CEBPA, PML, CCND2, KRAS, RAF1,CHEK2, PTEN, CCND3, LCK, REL, CREB1, RB1, CDX2, LMO2, RET, CREBBP,RUNX1, CTNNB1, MAF, ROS1, CYLD, SDHB, DDB2, MAFB, SMO, DDX5, SDHD,DDIT3, MAML2, SS18, EXT1, SMARCA4, DDX6, MDM2, TCL1A, EXT2, SMARCB1,DEK, MET, TET2, FBXW7, SOCS1, EGFR, MITF, TFG, FH, STK11, ELK4, MLL,TLX1, FLT3, SUFU, ERBB2, MPL, TPR, FOXP1, SUZ12, ETV4, MYB, USP6, GPC3,SYK, ETV6, IDH1, and/or TCF3. In some embodiments, an assay may includedetection of genetic biomarkers (e.g., genetic alterations) in one ormore of AKT1, APC, BRAF, CDKN2A, CTNNB1, FBXW7, FGF2, GNAS, HRAS, KRAS.In some embodiments, an assay may include detection of geneticbiomarkers (e.g., genetic alterations) in one or more of KT1, APC, BRAF,CDKN2, CTNNB1, FBXW7, FGFR2, GNAS, HRAS, KRAS, PPP2R1A, TP53, PTEN,PIK3CA, EGFR and NRAS. In some embodiments, once an assay that includesdetection of a genetic biomarker present in ctDNA in a biological sampleis performed, subsequent testing or monitoring is performed (e.g., anyof the variety of further diagnostic testing or increased monitoringtechniques disclosed herein). In some embodiments, once an assay thatincludes detection of a genetic biomarker present in ctDNA in abiological sample is performed, a second assay that includes detectingprotein biomarkers at high thresholds can be performed (e.g., detectingany of the variety of protein biomarkers described herein including, butnot limited to, carbohydrate antigen 19-9 (CA19-9), carcinoembryonicantigen (CEA), hepatocyte growth factor (HGF), osteopontin (OPN), CA125,AFP, prolactin, TIMP-1, follistatin, G-CSF, and CA15-3 and combinationsthereof).

In some embodiments, at least two codons or at least two amplicons of atumor suppressor gene or an oncogene can be tested. In some embodiments,at least three, at least four, or at least five or more codons oramplicons of an individual tumor suppressor gene or oncogene can beassayed. In some embodiments, the more distributed the mutations are ina gene, the more codons or amplicons one may desirably test.

Exemplary codons of tumor suppressor genes and oncogenes which may betested include, without limitation, one or more of the following codonsand their surrounding splice sites: codons 16-18 of AKT1; codons1304-1311, 1450-1459 of APC; codons 591-602 of BRAF; codons 51-58, 76-88of CDKN2A; codons 31-39, 38-47 of CTNNB1; codons 856-868 of EGFR; codons361-371, 464-473, 473-483, 498-507 of FBXW7; codons 250-256 of FGFR2;codons 199-208 of GNAS; codons 7-19 of HRAS; codons 7-14, 57-65, 143-148of KRAS; codons 3-15, 54-63 of NRAS; codons 80-90, 343-348, 541-551,1038-1050 of PIK3CA; codons 175-187 of PPP2R1A; codons 90-98, 125-132,133-146, 145-154 of PTEN; and codons 10-22, 25-32, 33-40, 40-52, 52-64,82-94, 97-110, 112-125, 123-125, 126-132, 132-142, 150-163, 167-177,175-186, 187-195, 195-206, 207-219, 219-224, 226-237, 232-245, 248-261,261-268, 272-283, 279-290, 298-307, 307-314, 323-331, 333-344, 344-355,367-375, 374-386 of TP53. All or some of these regions may be tested. Insome embodiments, the mutation can be in KRAS, e.g., in codon 12 or 61.In other embodiments, the mutation may be in other codons of KRAS. Insome embodiments, the mutation can be in CDKN2A (e.g., any of the CDKN2Amutations identified in Example 2), In some embodiments, the mutationmay be in tumor suppressors or oncogenes, including but not limited toABL1; EVI1; MYC; APC; IL2; TNFAIP3; ABL2; EWSR1; MYCL1; ARHGEF12; JAK2;TP53; AKT1; FEV; MYCN; ATM; MAP2K4; TSC1; AKT2; FGFR1; NCOA4; BCL11B;MDM4; TSC2; ATF1; FGFR1OP; NFKB2; BLM; MEN1; VHL; BCL11A; FGFR2; NRAS;BMPR1A; MLH1; WRN; BCL2; FUS; NTRK1; BRCA1; MSH2; WT1; BCL3; GOLGA5;NUP214; BRCA2; NF1; BCL6; GOPC; PAX8; CARS; NF2; BCR; HMGA1; PDGFB;CBFA2T3; NOTCH1; BRAF; HMGA2; PIK3CA; CDH1; NPM1; CARD11; HRAS; PIM1;CDH11; NR4A3; CBLB; IRF4; PLAG1; CDK6; NUP98; CBLC; JUN; PPARG; SMAD4;PALB2; CCND1; KIT; PTPN11; CEBPA; PML; CCND2; KRAS; RAF1; CHEK2; PTEN;CCND3; LCK; REL; CREB1; RBI; CDX2; LMO2; RET; CREBBP; RUNX1; CTNNB1;MAF; ROS1; CPLD; SDHB; DDB2; MAFB; SMO; DDX5; SDHD; DDIT3; MAML2; SS18;EXT1; SMARCA4; DDX6; MDM2; TCL1A; EXT2; SMARCB1; DEK; MET; TET2; FBXW7;SOCS1; EGFR; MITF; TFG; FH; STK11; ELK4; MLL; TLX1; FLT3; SUFU; ERBB2;MPL; TPR; FOXP1; SUZ12; ETV4; MYB; USP6; GPC3; SYK; ETV6; IDH1; andTCF3. Testing may include amplification and/or sequencing.

In some embodiments, sequence determination to a high degree of accuracycan be advantageous when analytes are present in low quantities and/orfractions. High accuracy sequence determination may employoligonucleotide barcodes, whether endogenous or exogenous. These may beintroduced into a template analyte by amplification, for example, in thecase of an exogenous barcode. Alternatively, an endogenousoligonucleotide barcode may be used by attaching to it, for example, bymeans of ligation, an oligonucleotide adapter molecule. The adaptermolecule may contain a priming site for DNA synthesis, and/or forhybridization to a solid surface. The adapter can be immediatelyadjacent to the endogenous barcode or a fixed number of nucleotides fromthe endogenous barcode.

In some embodiments, oligonucleotide barcodes permit the labeling ofindividual template molecules in the sample prior to processing, inparticular amplification. For example, by demanding that all or a highproportion or a threshold proportion of family members (having the sameoligonucleotide barcode) display a mutation, it is possible to filterout or minimize false positive mutations that arise during amplificationand/or other DNA synthesis or processing. See, e.g., Kinde I, Wu J,Papadopoulos N, Kinzler K W, & Vogelstein B (2011) Detection andquantification of rare mutations with massively parallel sequencing.Proc Natl Acad Sci USA 108(23):9530-9535, the content of which isexplicitly incorporated by reference. Additionally or alternatively, athreshold for mutation calling that a mutation occurs in two differentfamilies. Multiple filters of this nature may be applied.

In some embodiments, methods provided herein can be used to detect agenetic biomarker (e.g., a genetic alteration (e.g., one or more geneticalterations)) in circulating tumor DNA present in cell-free DNA, wherethe cell-free DNA is present in an amount less than about 1500 ng, e.g.,less than about 1400 ng, less than about 1300 ng, less than about 1200ng, less than about 1100 ng, less than about 1000 ng, less than about900 ng, less than about 800 ng, less than about 700 ng, less than about600 ng, less than about 500 ng, less than about 400 ng, less than about300 ng, less than about 200 ng, less than about 150 ng, less than about100 ng, less than about 95 ng, less than about 90 ng, less than about 85ng, less than about 80 ng, less than about 75 ng, less than about 70 ng,less than about 65 ng, less than about 60 ng, less than about 55 ng,less than about 50 ng, less than about 45 ng, less than about 40 ng,less than about 35 ng, less than about 30 ng, less than about 25 ng,less than about 20 ng, less than about 15 ng, less than about 10 ng, orless than about 5 ng. In some embodiments, methods provided herein canbe used to detect a genetic biomarker (e.g., a genetic alteration (e.g.,one or more genetic alterations)) in circulating tumor DNA present incell-free DNA, where the circulating tumor DNA represents 100% of thecell-free DNA. In some embodiments, methods provided herein can be usedto detect a genetic biomarker (e.g., a genetic alteration (e.g., one ormore genetic alterations)) in circulating tumor DNA present in cell-freeDNA, where the circulating tumor DNA represents less than 100% of thecell-free DNA, e.g. about 95%, about 90%, about 85%, about 80%, about75%, about 70%, about 65%, about 60%, about 55%, about 50%, about 45%,about 40%, about 35%, about 30%, about 25%, about 20%, about 15%, about10%, about 5%, about 4%, about 3%, about 2%, about 1%, about 0.95%,about 0.90%, about 0.85%, about 0.80%, about 0.75%, about 0.70%, about0.65%, about 0.60%, about 0.55%, about 0.50%, about 0.45%, about 0.40%,about 0.35%, about 0.30%, about 0.25%, about 0.20%, about 0.15%, about0.10%, about 0.09%, about 0.08%, about 0.07%, about 0.06%, about 0.05%of the cell-free DNA, or less.

In some embodiments, one or more genetic biomarkers present in cell-freeDNA (e.g., ctDNA) and/or one or more protein biomarkers can be testedfrom any of a variety of biological samples isolated or obtained from asubject (e.g., a human subject) including, but not limited to the blood,plasma, serum, urine, cerebrospinal fluid, saliva, sputum,broncho-alveolar lavage, bile, lymphatic fluid, cyst fluid, stool,ascites, and combinations thereof. In some embodiments, one or moregenetic biomarkers present in cell-free DNA (e.g., ctDNA) and one ormore protein biomarkers can be tested from the same sample. For example,a single sample can be isolated or obtained from a subject, which singlesample can be tested for one or more genetic biomarkers present incell-free DNA (e.g., ctDNA), one or more protein biomarkers, or both.One or more genetic biomarkers present in cell-free DNA (e.g., ctDNA)and one or more protein biomarkers can be tested from the sample at thesame time or at different times. For example, the sample can be testedfor one or more genetic biomarkers present in cell-free DNA (e.g.,ctDNA)) at a first time, and for one or more protein biomarkers at asecond time, or vice versa. In some embodiments, the sample can berefrigerated, frozen, or otherwise stored for future testing. In someembodiments, one or more genetic biomarkers present in cell-free DNA(e.g., ctDNA) and one or more protein biomarkers can be tested fromdifferent samples. For example, a first sample can be isolated orobtained from a subject and tested for one or more genetic biomarkerspresent in cell-free DNA (e.g., ctDNA), and a second sample can beisolated or obtained from the subject and tested for one or more proteinbiomarkers. The first and second samples can be of the same type (e.g.,plasma or serum), or of different types. The first and/or second samplescan be refrigerated, frozen, or otherwise stored for future testing.

In some embodiments, any of the variety of assays disclosed herein canbe repeated to increase the accuracy of mutation detection. Assays maybe done in duplicate or triplicate, for example. In some embodiments,positive assays can be repeated on the same initial sample from apatient. Additionally or alternatively, a second sample may be obtainedfrom a patient at a later time, for example, when a positive results isfound. Any of the variety of assays described herein, including ctDNA,and/or protein biomarkers, may be repeated or run in parallelreplicates.

In some embodiments, a radiologic, sonographic, or other technique maybe applied to any subject (e.g., a human subject) in which a mutation isdetected. The technique may be applied to the whole body, to a singleorgan, or to a region of the body. The technique may be used, forexample, to ascertain a particular type of cancer is present, to confirma cancer is present, or to identify location of a cancer in the body. Insome embodiments, the technique is a scan. In some embodiments, the scanis a computed tomography (CT), a CT angiography (CTA), a esophagram (aBarium swallom), a Barium enema, a magnetic resonance imaging (MRI), aPET scan, an ultrasound (e.g., an endobronchial ultrasound, anendoscopic ultrasound), an X-ray, a DEXA scan, or a positron emissiontomography and computed tomography (PET-CT) scan. In some embodiments,the technique is a physical examination, such as an anoscopy, abronchoscopy (e.g., an autofluorescence bronchoscopy, a white-lightbronchoscopy, a navigational bronchoscopy), a colonoscopy, a digitalbreast tomosynthesis, an endoscopic retrograde cholangiopancreatography(ERCP), an ensophagogastroduodenoscopy, a mammography, a Pap smear, or apelvic exam, In some embodiments, the technique is a biopsy (e.g., abone marrow aspiration, a tissue biopsy). In some embodiments, thebiopsy is performed by fine needle aspiration or by surgical excision.In some embodiments, the technique further includes obtaining abiological sample (e.g., a tissue sample, a urine sample, a bloodsample, a check swab, a saliva sample, a mucosal sample (e.g., sputum,bronchial secretion), a nipple aspirate, a secretion or an excretion).In some embodiments, the technique includes determining exosomalproteins (e.g., an exosomal surface protein (e.g., CD24, CD147, PCA-3))(Soung et al. (2017) Cancers 9(1):pii:E8). In some embodiments, thediagnostic testing method is an oncotype DX® test (Baehner (2016)Ecancermedicalscience 10:675).

In some embodiments, various methods described herein can be used todetect cancers selected from the group consisting of: pancreatic cancer,colon cancer, esophageal cancer, stomach cancer, ovarian cancer, livercancer, lung cancer, and breast cancer, and combinations thereof.

In some embodiments, methods provided herein (e.g., methods in whichgenetic biomarkers present in cell-free DNA (e.g., ctDNA) and highthreshold protein biomarkers are detected in a biological sampleisolated from the subject) can be used for selecting a treatment for asubject. For example, once a subject has been determined to have cancer(e.g., pancreatic cancer, colon cancer, esophageal cancer, stomachcancer, ovarian cancer, liver cancer, lung cancer, or breast cancer) byany of the variety of methods disclosed herein, an appropriate treatmentcan be selected (e.g., any of the variety of therapeutic interventionsdescribed herein). In some embodiments, methods provided herein (e.g.,methods in which genetic biomarkers present in cell-free DNA (e.g.,ctDNA) and high threshold protein biomarkers are detected in abiological sample isolated from the subject) can be used for selecting asubject for treatment. For example, once a subject has been determinedto have cancer (e.g., pancreatic cancer, colon cancer, esophagealcancer, stomach cancer, ovarian cancer, liver cancer, lung cancer, orbreast cancer) by any of the variety of methods disclosed herein, thatsubject can be identified as an appropriate subject to receive atreatment (e.g., any of the variety of therapeutic interventionsdescribed herein). In some embodiments, methods provided herein (e.g.,methods in which genetic biomarkers present in cell-free DNA (e.g.,ctDNA) and high threshold protein biomarkers are detected in abiological sample isolated from the subject) can be used for selecting asubject for increased monitoring. For example, once a subject has beendetermined to have cancer (e.g., pancreatic cancer, colon cancer,esophageal cancer, stomach cancer, ovarian cancer, liver cancer, lungcancer, or breast cancer) by any of the variety of methods disclosedherein, that subject can be identified as an appropriate subject toreceive increased monitoring (e.g., any of the variety of monitoringtechniques described herein). In some embodiments, methods providedherein (e.g., methods in which genetic biomarkers present in cell-freeDNA (e.g., ctDNA) and high threshold protein biomarkers are detected ina biological sample isolated from the subject) can be used for selectinga subject for further diagnostic testing. For example, once a subjecthas been determined to have cancer (e.g., pancreatic cancer, coloncancer, esophageal cancer, stomach cancer, ovarian cancer, liver cancer,lung cancer, or breast cancer) by any of the variety of methodsdisclosed herein, that subject can be identified as an appropriatesubject to receive further diagnostic testing (e.g., any of the varietyof diagnostic techniques described herein).

In some embodiments, methods provided herein can be used to detect thepresence of cancer (e.g., pancreatic cancer, colon cancer, esophagealcancer, stomach cancer, ovarian cancer, liver cancer, lung cancer, orbreast cancer) at a time period prior to diagnosis of the subject withan early-stage cancer and/or at a time prior to the subject exhibitingsymptoms associated with cancer. For example, methods provided hereincan be used when a subject has not been diagnosed with cancer and/orwhen a subject is not known to harbor a cancer cell.

In some embodiments, certain protein biomarkers can be detected at highthreshold levels to detect specific types of cancers. For example, highthreshold levels of CA19-9 can be detected to indicate the presence ofpancreatic cancer. Additionally or alternatively, high threshold levelsof CEA can be detected to indicate the presence of, for example, colon,gastric, pancreatic, lung, and/or breast cancer. Additionally oralternatively, high threshold levels of CA-125 can be detected toindicate the presence of, for example, ovarian cancer. Additionally oralternatively, high threshold levels of AFP can be detected to indicatethe presence of liver cancer. Additionally or alternatively, highthreshold levels of prolactin can be detected to indicate the presenceof, for example, ovarian, breast, and/or lung cancer. Additionally oralternatively, high threshold levels of HFG can be detected to indicatethe presence of, for example, esophageal, gastric, and/or liver cancer.Additionally or alternatively, high threshold levels of OPN can bedetected to indicate the presence of, for example, ovarian, breast,and/or lung cancer. Additionally or alternatively, high threshold levelsof TIMP-1 can be detected to indicate the presence of, for example,colon and/or pancreatic cancer. Additionally or alternatively, highthreshold levels of follistatin can be detected to indicate the presenceof, for example, ovarian and/or lung cancer. Additionally oralternatively, high threshold levels of G-CSF can be detected toindicate the presence of, for example, ovarian cancer. Additionally oralternatively, high threshold levels of CA15-3 can be detected toindicate the presence of, for example, breast cancer. Exemplary proteinbiomarkers detected in various cancer types are shown in Example 2.

In some embodiments, assays for genetic biomarkers (e.g., geneticalterations) can be combined with assays for elevated protein biomarkersto increase the sensitivity of a blood test for low stage pancreaticcancers. In some embodiments, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%,90%, 95% or more of such cancers can be detected through thiscombination test, including some patients with a favorable prognosis. Insome embodiments, 64% of such cancers can be detected through thiscombination test, including some patients with a favorable prognosis.One of the design features of certain studies presented herein was thatonly patients with resectable pancreatic cancers were included, andpatients with advanced disease (i.e., Stage III or IV) were excluded.Though this exclusion reduced the sensitivity that could be otherwise beachieved by evaluating all pancreatic cancer patients, regardless ofstage, the resectable cases are represent a promising group withadvantageous clinical relevance with respect to evaluating a screeningtechnology. In some embodiments, methods provided herein can be used todetect all pancreatic cancers in subjects (e.g., human subjects).

Whether combining genetic biomarkers present in ctDNA and proteinmarkers could increase sensitivity over either alone was not known priorto the present disclosure. In fact, it was conceivable that the samepatients with detectable circulating protein markers would largelyoverlap those releasing DNA into the circulation. This was of particularconcern for early stage cancer patients, because both ctDNA andprotein-based markers are known to be considerably higher in patientswith advanced cancers compared to those with earlier stage cancers(Lennon A M & Goggins M (2010) Diagnostic and Therapeutic ResponseMarkers. Pancreatic Cancer, (Springer New York, N.Y., N.Y.), pp 675-701,Locker G Y, et al. (2006) ASCO 2006 update of recommendations for theuse of tumor markers in gastrointestinal cancer. J Clin Oncol24(33):5313-5327, Bettegowda C, et al. (2014) Detection of circulatingtumor DNA in early- and late-stage human malignancies. Sciencetranslational medicine 6(224):224ra224).

In some embodiments of the methods provided herein, very highspecificity (e.g., 99.5%: 95% CI 97-100%) can be achieved. For example,only one false positive among 182 healthy individuals of average age 64was observed in the studies presented herein. Given the relativeinfrequency of cancer in the general population, the specificity of anypotentially useful blood-based screening test for pancreatic cancer ispreferably high, e.g., preferably >99%. Otherwise, the number of falsepositives would greatly exceed the number of true positives (i.e., havesuboptimal positive predictive value) (Lennon A M, et al. (2014) TheEarly Detection of Pancreatic Cancer: What Will It Take to Diagnose andTreat Curable Pancreatic Neoplasia? Cancer Res 74(13):3381-3389). Suchstringency for screening tests is not typically required for tests tomonitor disease in patients with known cancer. For monitoring,specificity can be relaxed somewhat in the interest of obtaining highersensitivity. High specificity was achieved with various methodsdisclosed herein in at least two ways. First, ctDNA was used as one ofthe components of the test. KRAS mutations are exquisitely specific forneoplasia and their specificity has traditionally been limited bytechnical rather than biological factors. The incorporation of molecularbarcoding into various assays described herein (e.g., using a Safe-SeqStechnique) can minimize the false positive results from sequencing thathave traditionally been major technical issues confronting anyctDNA-based assays. KRAS mutations are particularly suitable for earlydetection strategies because they are rarely found in clones arisingduring age-associated clonal hematopoiesis. Such clones, which mayrepresent early forms of myelodysplasia, are a potential source of falsepositive ctDNA assays. The vast majority of such mutations occur withinnine genes (DNMT3A, TET2, JAK2, ASXL1, TP53, GNAS, PPM1D, BCORL1 andSF3B1) (48-50), posing challenges for the use of these genes asbiomarkers in ctDNA-based assays. Second, high thresholds were used forscoring the protein markers as positive. These thresholds were based onprior studies in the literature or on an independent set of controls,permitting avoidance of positive scores in the vast majority of healthypatients (Kim J E, et al. (2004) Clinical usefulness of carbohydrateantigen 19-9 as a screening test for pancreatic cancer in anasymptomatic population. J Gastroenterol Hepatol 19(2):182-186). In someembodiments, such high thresholds can be used without an overallreduction in sensitivity because the ctDNA assay added sensitivity onits own and the ctDNA-positive cases only partially overlapped theprotein-biomarker-positive cases (See, e.g., FIG. 9, FIG. 10, andExample 2).

Protein biomarkers have been combined with each other in the past toachieve higher sensitivity (Dong T, Liu C C, Petricoin E F, & Tang L L(2014) Combining markers with and without the limit of detection. StatMed 33(8):1307-132). For example, it was shown that combining CA19-9 andTIMP-1 was more sensitive for the detection of PDAC than eitherbiomarker alone (Zhou W, et al. (1998) Identifying markers forpancreatic cancer by gene expression analysis. Cancer EpidemiolBiomarkers Prev 7(2):109-112). More recently, it was shown that thecombination of CA19-9, TIMP-1, and LRG-1 was more sensitive for thedetection of early PDAC than CA19-9 alone (Dong T, Liu C C, Petricoin EF, & Tang L L (2014) Combining markers with and without the limit ofdetection. Stat Med 33(8):1307-1320). The combination of proteinbiomarkers with ultrasensitive ctDNA, as disclosed herein, is different.A recent study evaluated a combination of ctDNA and CA19-9 forpancreatic cancer but found no benefit to combining the biomarkers overCA19-9 alone. Without being bound by theory, it is possible thisconclusion was reached due to inadequate sensitivity of the test used indetecting KRAS mutations (Le Calvez-Kelm F, et al. (2016) KRAS mutationsin blood circulating cell-free DNA: a pancreatic cancer case-control.Oncotarget 7(48):78827-78840). Furthermore, the specificity for ctDNAachieved in that study was relatively low, reducing its suitability forscreening.

In some embodiments, methods provided herein can be used to detectresectable cancers through a non-invasive blood test in a majority ofpatients.

In some embodiments, results obtained using any of the variety ofmethods disclosed herein can underestimate the survival benefits ofearly detection. The majority of the patients that were studied herein,even though they had resectable cancers, were symptomatic and theircancers were discovered only by virtue of their symptoms. Accordingly,77% of patients in the cohort described herein were Stage IIB and themedian size of tumors in these patients was 3 cm. In some embodiments,in a screening study of asymptomatic individuals, a greater proportionof earlier stage patients, with smaller tumors, can be discovered usingany of the variety of methods disclosed herein. In some embodiments, anyof the variety of methods disclosed herein can be more sensitive for thedetection of patients with larger tumors and with a poorer prognosisthan for patients with smaller tumors, even though all tumors can besurgically resectable (See, e.g., FIG. 9B, and Example 2). In someembodiments, KRAS mutations can be found in the circulation of patientswith cancer types other than those of the pancreas, primarily those ofthe lung (Herbst R S, Heymach J V, & Lippman S M (2008) Lung cancer. NEngl J Med 359(13):1367-1380), and CA19-9, CEA, HGF, and OPN expressioncan be elevated in several other cancer types (Kim J E, et al. (2004)Clinical usefulness of carbohydrate antigen 19-9 as a screening test forpancreatic cancer in an asymptomatic population. J Gastroenterol Hepatol19(2):182-186; Thomas D S, et al. (2015) Evaluation of serum CEA,CYFRA21-1 and CA125 for the early detection of colorectal cancer usinglongitudinal preclinical samples. Br J Cancer 113(2):268-274; Di Renzo MF, et al. (1995) Overexpression and amplification of the met/HGFreceptor gene during the progression of colorectal cancer. Clin CancerRes 1(2):147-154; El-Tanani M K, et al. (2006) The regulation and roleof osteopontin in malignant transformation and cancer. Cytokine GrowthFactor Rev 17(6):463-474). Thus, in some embodiments, patients testingpositive using any of the variety of methods disclosed herein canundergo additional appropriate imaging studies to identify tumorlocalization.

In some embodiments, methods provided herein lay a foundation forevaluation of patients at high risk for PDAC, and for implementation ofearly detection strategies (Kalinich M, et al. (2017) An RNA-basedsignature enables high specificity detection of circulating tumor cellsin hepatocellular carcinoma. Proc Natl Acad Sci USA 114(5):1123-1128).As an example, new-onset diabetes is known to be associated with anincreased risk for pancreatic cancer. Approximately 1% of diabeticpatients aged 50 and older are diagnosed with pancreatic cancer within 3years of first meeting criteria for diabetes (Chari S T, et al. (2005)Probability of pancreatic cancer following diabetes: a population-basedstudy. Gastroenterology 129(2):504-511). With an incidence of 1%, thePPV/NPV of certain combination assays disclosed herein is expected to be54% and 99.6%, respectively, in this population, which is well withinthe range of currently approved screening tests for cancers.

Available evidence indicates that many cancers have detectable geneticbiomarkers present in ctDNA in their earliest stages, often morecommonly than observed in pancreatic cancer (Bettegowda C, et al. (2014)Detection of circulating tumor DNA in early- and late-stage humanmalignancies. Science translational medicine 6(224):224ra224).Similarly, a large number of protein biomarkers have already beendescribed for the detection of numerous cancer types (Liotta L A &Petricoin E F, 3rd (2003) The promise of proteomics. Clin Adv HematolOncol 1(8):460-462). These protein biomarkers can be thresholdedaccording to any of the variety of methods described herein, permittingthe use ctDNA-protein combinations to detect a variety of cancer types(Bettegowda C, et al. (2014) Detection of circulating tumor DNA inearly- and late-stage human malignancies. Science translational medicine6(224):224ra224).

In certain aspects, provided herein are assays that can be used as acancer screening test with improved sensitivity while retainingspecificity. In some embodiments, the assays combine detection ofmutations in genetic biomarkers present in circulating tumor DNA (ctDNA)with detection of thresholded protein biomarkers in plasma. In someembodiments, ctDNA is tested for the presence of genetic biomarkersalone. In some embodiments, protein biomarkers are tested alone. In someembodiments, the combination of the genetic biomarkers present in ctDNAand protein markers can be superior to any single marker. For example,in some embodiments the combination can detect nearly two-thirds ofpancreatic cancers that have no evidence of distant metastasis at thetime of surgical resection. In some embodiments, sequence determinationto a high degree of accuracy can be advantageous when analytes arepresent in low quantities and/or fractions. High accuracy sequencedetermination may employ oligonucleotide barcodes, whether endogenous orexogenous. These may be introduced into a template analyte byamplification, for example, in the case of an exogenous barcode.Alternatively, an endogenous oligonucleotide barcode may be used byattaching to it, for example, by means of ligation, an oligonucleotideadapter molecule. The adapter molecule may contain a priming site forDNA synthesis, and/or for hybridization to a solid surface. The adaptercan be immediately adjacent to the endogenous barcode or a fixed numberof nucleotides from the endogenous barcode.

In some embodiments, oligonucleotide barcodes permit the labeling ofindividual template molecules in the sample prior to processing, inparticular amplification. For example, by demanding that all or a highproportion or a threshold proportion of family members (having the sameoligonucleotide barcode) display a mutation, it is possible to filterout or minimize false positive mutations that arise during amplificationand/or other DNA synthesis or processing. See, e.g., Kinde I, Wu J,Papadopoulos N, Kinzler K W, & Vogelstein B (2011) Detection andquantification of rare mutations with massively parallel sequencing.Proc Natl Acad Sci USA 108(23):9530-9535, the content of which isexplicitly incorporated by reference. Additionally or alternatively, athreshold for mutation calling that a mutation occurs in two differentfamilies. Multiple filters of this nature may be applied.

In some embodiments, methods provided herein can be used to detect agenetic alteration (e.g., one or more genetic alterations) incirculating tumor DNA present in cell-free DNA, where the cell-free DNAis present in an amount less than about 1500 ng, e.g., less than about1400 ng, less than about 1300 ng, less than about 1200 ng, less thanabout 1100 ng, less than about 1000 ng, less than about 900 ng, lessthan about 800 ng, less than about 700 ng, less than about 600 ng, lessthan about 500 ng, less than about 400 ng, less than about 300 ng, lessthan about 200 ng, less than about 150 ng, less than about 100 ng, lessthan about 95 ng, less than about 90 ng, less than about 85 ng, lessthan about 80 ng, less than about 75 ng, less than about 70 ng, lessthan about 65 ng, less than about 60 ng, less than about 55 ng, lessthan about 50 ng, less than about 45 ng, less than about 40 ng, lessthan about 35 ng, less than about 30 ng, less than about 25 ng, lessthan about 20 ng, less than about 15 ng, less than about 10 ng, or lessthan about 5 ng. In some embodiments, methods provided herein can beused to detect a genetic alteration (e.g., one or more geneticalterations) in circulating tumor DNA present in cell-free DNA, wherethe circulating tumor DNA represents 100% of the cell-free DNA. In someembodiments, methods provided herein can be used to detect a geneticalteration (e.g., one or more genetic alterations) in circulating tumorDNA present in cell-free DNA, where the circulating tumor DNA representsless than 100% of the cell-free DNA, e.g. about 95%, about 90%, about85%, about 80%, about 75%, about 70%, about 65%, about 60%, about 55%,about 50%, about 45%, about 40%, about 35%, about 30%, about 25%, about20%, about 15%, about 10%, about 5%, about 4%, about 3%, about 2%, about1%, about 0.95%, about 0.90%, about 0.85%, about 0.80%, about 0.75%,about 0.70%, about 0.65%, about 0.60%, about 0.55%, about 0.50%, about0.45%, about 0.40%, about 0.35%, about 0.30%, about 0.25%, about 0.20%,about 0.15%, about 0.10%, about 0.09%, about 0.08%, about 0.07%, about0.06%, about 0.05% of the cell-free DNA, or less.

In some embodiments, the presence of genetic biomarkers (e.g., mutationsin cell-free DNA (e.g., ctDNA)) may be tested from any of a variety ofbiological samples isolated or obtained from a subject (e.g., a humansubject) including, but not limited to blood, plasma, serum, urine,cerebrospinal fluid, saliva, sputum, broncho-alveolar lavage, bile,lymphatic fluid, cyst fluid, stool, ascites, and combinations thereof.In some embodiments, one or more genetic biomarkers present in cell-freeDNA (e.g., ctDNA) and one or more protein biomarkers can be tested fromthe same sample. For example, a single sample can be isolated orobtained from a subject, which single sample can be tested for geneticbiomarkers present in cell-free DNA (e.g., ctDNA), one or more proteinbiomarkers, or both. The presence of one or more genetic biomarkers incell-free DNA (e.g., ctDNA) and the presence of one or more proteinbiomarkers can be tested from the sample at the same time or atdifferent times. For example, the sample can be tested for the presenceof one or more genetic biomarkers in cell-free DNA (e.g., ctDNA) at afirst time, and for the presence of one or more protein biomarkers at asecond time, or vice versa. In some embodiments, the sample can berefrigerated, frozen, or otherwise stored for future testing. In someembodiments, the presence of genetic biomarkers in cell-free DNA (e.g.,ctDNA) and the presence of one or more protein biomarkers can be testedfrom different samples. For example, a first sample can be isolated orobtained from a subject and tested for the presence of geneticbiomarkers in cell-free DNA (e.g., ctDNA), and a second sample can beisolated or obtained from the subject and tested for the presence of oneor more protein biomarkers. The first and second samples can be of thesame type (e.g., plasma or serum), or of different types. The firstand/or second samples can be refrigerated, frozen, or otherwise storedfor future testing.

In some embodiments, any of the variety of assays disclosed herein canbe repeated to increase the accuracy of mutation detection. Assays maybe done in duplicate or triplicate, for example. In some embodiments,positive assays can be repeated on the same initial sample from apatient. Additionally or alternatively, a second sample may be obtainedfrom a patient at a later time, for example, when a positive results isfound. Any of the variety of assays described herein, including ctDNAand/or protein biomarkers, may be repeated or run in parallelreplicates.

In some embodiments, a genetic biomarker (e.g., a mutation in cell-freeDNA) may be in a tumor suppressor gene or an oncogene. For example, thegenetic biomarkers (e.g., mutations) may be in hot spots for mutations,e.g., sites that are frequently muted in tumors or other cancers. Insome embodiments, the mutation can be in KRAS, e.g., in codon 12 or 61.In other embodiments, the genetic biomarker (e.g., mutation) may be inother codons of KRAS. In some embodiments, the genetic biomarker (e.g.,mutation) can be in CDKN2A (e.g., any of the CDKN2A mutations identifiedin Example 2), In some embodiments, the genetic biomarker (e.g.,mutation) may be in tumor suppressor genes or oncogenes, including butnot limited to ABL1; EVI1; MYC; APC; IL2; TNFAIP3; ABL2; EWSR1; MYCL1;ARHGEF12; JAK2; TP53; AKT1; FEV; MYCN; ATM; MAP2K4; TSC1; AKT2; FGFR1;NCOA4; BCL11B; MDM4; TSC2; ATF1; FGFR1OP; NFKB2; BLM; MEN1; VHL; BCL11A;FGFR2; NRAS; BMPR1A; MLH1; WRN; BCL2; FUS; NTRK1; BRCA1; MSH2; WT1;BCL3; GOLGA5; NUP214; BRCA2; NF1; BCL6; GOPC; PAX8; CARS; NF2; BCR;HMGA1; PDGFB; CBFA2T3; NOTCH1; BRAF; HMGA2; PIK3CA; CDH1; NPM1; CARD11;HRAS; PIM1; CDH11; NR4A3; CBLB; IRF4; PLAG1; CDK6; NUP98; CBLC; JUN;PPARG; SMAD4; PALB2; CCND1; KIT; PTPN11; CEBPA; PML; CCND2; KRAS; RAF1;CHEK2; PTEN; CCND3; LCK; REL; CREB1; RBI; CDX2; LMO2; RET; CREBBP;RUNX1; CTNNB1; MAF; ROS1; CYLD; SDHB; DDB2; MAFB; SMO; DDX5; SDHD;DDIT3; MAML2; SS18; EXT1; SMARCA4; DDX6; MDM2; TCL1A; EXT2; SMARCB1;DEK; MET; TET2; FBXW7; SOCS1; EGFR; MITF; TFG; FH; STK11; ELK4; MLL;TLX1; FLT3; SUFU; ERBB2; MPL; TPR; FOXP1; SUZ12; ETV4; MYB; USP6; GPC3;SYK; ETV6; IDH1; and TCF3. In some embodiments, protein biomarkers maybe tested from any of a variety of biological samples isolated orobtained from a subject (e.g., a human subject) including, but notlimited to blood, plasma, serum, urine, cerebrospinal fluid, saliva,sputum, broncho-alveolar lavage, bile, lymphatic fluid, cyst fluid,stool, ascites, and combinations thereof. Protein biomarkers that arefound in high amounts in cancers can be tested for amounts of theprotein biomarkers that do not occur in healthy human subjects. Examplesof protein biomarkers, any one, two, three, or four of which may betested, include, without limitation, carbohydrate antigen 19-9 (CA19-9),carcinoembryonic antigen (CEA), hepatocyte growth factor (HGF), andosteopontin (OPN). In some embodiments, a threshold level of CA19-9 canbe at least about 100 U/mL (e.g., about 100 U/mL). In some embodiments,a threshold level of CA19-9 can be 100 U/mL. In some embodiments, athreshold level of CEA can be at least about 7.5 ng/mL (e.g., about 7.5ng/mL). In some embodiments, a threshold level of CEA can be 7.5 ng/mL.In some embodiments, a threshold level of HGF can be at least about 0.92ng/mL (e.g., about 0.92 ng/mL). In some embodiments, a threshold levelof HGF can be 0.92 ng/mL. In some embodiments, a threshold level of OPNcan be at least about 158 ng/mL (e.g., about 158 ng/mL). In someembodiments, a threshold level of OPN can be 158 ng/mL. In someembodiments, a threshold level of CA19-9, CEA, and/or OPN can be 5%,10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%,80%, 85%, 90%, 95%, 100% or more greater than the threshold levelslisted above (e.g., greater than a threshold level of 100 U/mL forCA-19-9, 7.5 ng/mL for CEA, 0.92 ng/mL for HGF, and/or 158 ng/mL forOPN).

Any protein biomarker known in the art may be used when a thresholdvalue is obtained above which normal, healthy human subjects do notfall, but human subjects with cancer do fall. Non-limiting examples ofsuch protein biomarkers include Translation elongation factor (EEF1A1);Glyceraldehyde-3-phosphate dehydrogenase (GAPDH); Actin gamma (ACTG1);Ferritin, heavy polypeptide 1 (FTH1); Eukaryotic translation elongationfactor 1 gamma (EEF1G); Ribosomal protein, large subunit, P0 (RPLP0);Heat shock protein 90 kDa alpha (cytosolic), class B member 1(HSP90AB1); Pyruvate kinase, muscle (PKM2); Ferritin, light polypeptide(FTL); and Ribosomal protein L3 (RPL3). Protein biomarkers known to beoverexpressed in serum include but are not limited to Transferrin, α-1antitrypsin, apolipo protein 1, complement c3a, Caveolin-1, Kallikrein6, Glucose regulated protein-8, α defensing-1,-2,-3, Serum C-peptide,Alpha-2-HS glycol protein, Catenin, Defensin α 6, MMPs, Cyclin D, S100P, Lamin A/C filament protein, and Tx1-2, (thioredoxin like protein-2).

In some embodiments, an assay includes detection of thresholded proteinbiomarkers in a biological sample (e.g., any biological sample disclosedherein such as plasma) without detection of genetic biomarkers (e.g.,mutations in circulating tumor DNA (ctDNA)). For example, an assay mayinclude detection of one or more of CA19-9, CEA, HGF, and/or OPN in abiological sample. In some embodiments, an assay may include detectionof one or more of CA19-9, CEA, HGF, and/or OPN in a biological sample atany of the threshold levels disclosed herein. In some embodiments, oncean assay that includes detection of thresholded protein biomarkers in abiological sample is performed, subsequent testing or monitoring isperformed (e.g., any of the variety of further diagnostic testing orincreased monitoring techniques disclosed herein). In some embodiments,once an assay that includes detection of thresholded protein biomarkersin a biological sample is performed, a second assay that includesdetecting one or more genetic biomarkers present in cell-free DNA (e.g.,ctDNA) can be performed (e.g., detecting any of the variety of geneticalterations that are present in cell-free DNA or ctDNA as describedherein).

In some embodiments, an assay includes detection of one or more geneticbiomarkers present in circulating tumor DNA (ctDNA) in a biologicalsample (e.g., any biological sample disclosed herein such as plasma)without detection of thresholded protein biomarkers. For example, anassay may include detection of genetic biomarkers (e.g., geneticalterations) in one or more of any of the genes disclosed hereinincluding, without limitation, CDKN2A, FGF2, GNAS, ABL1, EVI1, MYC, APC,IL2, TNFAIP3, ABL2, EWSR1, MYCL1, ARHGEF12, JAK2, TP53, AKT1, FEV, MYCN,ATM, MAP2K4, TSC1, AKT2, FGFR1, NCOA4, BCL11B, MDM4, TSC2, ATF1,FGFR1OP, NFKB2, BLM, MEN1, VHL, BCL11A, FGFR2, NRAS, BMPR1A, MLH1, WRN,BCL2, FUS, NTRK1, BRCA1, MSH2, WT1, BCL3, GOLGA5, NUP214, BRCA2, NF1,BCL6, GOPC, PAX8, CARS, NF2, BCR, HMGA1, PDGFB, CBFA2T3, NOTCH1, BRAF,HMGA2, PIK3CA, CDH1, NPM1, CARD11, HRAS, PIM1, CDH11, NR4A3, CBLB, IRF4,PLAG1, CDK6, NUP98, CBLC, JUN, PPARG, SMAD4, PALB2, CCND1, KIT, PTPN11,CEBPA, PML, CCND2, KRAS, RAF1, CHEK2, PTEN, CCND3, LCK, REL, CREB1, RB1,CDX2, LMO2, RET, CREBBP, RUNX1, CTNNB1, MAF, ROS1, CYLD, SDHB, DDB2,MAFB, SMO, DDX5, SDHD, DDIT3, MAML2, SS18, EXT1, SMARCA4, DDX6, MDM2,TCL1A, EXT2, SMARCB1, DEK, MET, TET2, FBXW7, SOCS1, EGFR, MITF, TFG, FH,STK11, ELK4, MLL, TLX1, FLT3, SUFU, ERBB2, MPL, TPR, FOXP1, SUZ12, ETV4,MYB, USP6, GPC3, SYK, ETV6, IDH1, and/or TCF3. In some embodiments, anassay may include detection of genetic alterations in KRAS (e.g., incodons 12 and/or 61 of KRAS). In some embodiments, once an assay thatincludes detection of one or more genetic biomarkers present in ctDNA ina biological sample is performed, subsequent testing or monitoring isperformed (e.g., any of the variety of further diagnostic testing orincreased monitoring techniques disclosed herein). In some embodiments,once an assay that includes detection of one or more genetic biomarkerspresent in ctDNA in a biological sample is performed, a second assaythat includes detecting one or more protein biomarkers at highthresholds can be performed (e.g., detecting any of the variety ofprotein biomarkers described herein including, but not limited to,carbohydrate antigen 19-9 (CA19-9), carcinoembryonic antigen (CEA),hepatocyte growth factor (HGF), osteopontin (OPN), and combinationsthereof).

In some embodiments, one or more genetic biomarkers present in cell-freeDNA (e.g., ctDNA) and/or one or more protein biomarkers can be testedfrom any of a variety of biological samples isolated or obtained from asubject (e.g., a human subject) including, but not limited to the blood,plasma, serum, urine, cerebrospinal fluid, saliva, sputum,broncho-alveolar lavage, bile, lymphatic fluid, cyst fluid, stool,ascites, and combinations thereof. In some embodiments, one or moregenetic biomarkers present in cell-free DNA (e.g., ctDNA) and one ormore protein biomarkers can be tested from the same sample. For example,a single sample can be isolated or obtained from a subject, which singlesample can be tested for the presence of one or more genetic biomarkersin cell-free DNA (e.g., ctDNA), the presence of one or more proteinbiomarkers, or both. Genetic biomarkers present in cell-free DNA (e.g.,ctDNA) and one or more protein biomarkers can be tested from the sampleat the same time or at different times. For example, the sample can betested for the presence of one or more genetic biomarkers in cell-freeDNA (e.g., ctDNA) at a first time, and for one or more proteinbiomarkers at a second time, or vice versa. In some embodiments, thesample can be refrigerated, frozen, or otherwise stored for futuretesting. In some embodiments, the presence of one or more geneticbiomarkers in cell-free DNA (e.g., ctDNA) and the presence of one ormore protein biomarkers can be tested from different samples. Forexample, a first sample can be isolated or obtained from a subject andtested for the presence of one or more genetic biomarkers in cell-freeDNA (e.g., ctDNA), and a second sample can be isolated or obtained fromthe subject and tested for the presence of one or more proteinbiomarkers. The first and second samples can be of the same type (e.g.,plasma or serum), or of different types. The first and/or second samplescan be refrigerated, frozen, or otherwise stored for future testing.

In some embodiments, multiple codons or gene regions in a tumorsuppressor gene or oncogene may be tested to identify a geneticbiomarker (e.g., a mutation). For example, at least two, at least three,at least four, at least five or more codons or gene regions may betested in a gene. Additionally or alternatively, multiple genes may betested for mutations to increase the scope of an assay for more types ofcancers or more cancers within a single type.

In some embodiments, a radiologic, sonographic, or other technique maybe applied to any subject (e.g., a human subject) in which geneticbiomarker (e.g., a mutation) is detected. The technique may be appliedto the whole body, to a single organ, or to a region of the body. Thetechnique may be used, for example, to ascertain a particular type ofcancer is present, to confirm a cancer is present, or to identifylocation of a cancer in the body. In some embodiments, the technique isa scan. In some embodiments, the scan is a computed tomography (CT), aCT angiography (CTA), a esophagram (a Barium swallom), a Barium enema, amagnetic resonance imaging (MM), a PET scan, an ultrasound (e.g., anendobronchial ultrasound, an endoscopic ultrasound), an X-ray, a DEXAscan, or a positron emission tomography and computed tomography (PET-CT)scan. In some embodiments, the technique is a physical examination, suchas an anoscopy, a bronchoscopy (e.g., an autofluorescence bronchoscopy,a white-light bronchoscopy, a navigational bronchoscopy), a colonoscopy,a digital breast tomosynthesis, an endoscopic retrogradecholangiopancreatography (ERCP), an ensophagogastroduodenoscopy, amammography, a Pap smear, or a pelvic exam, In some embodiments, thetechnique is a biopsy (e.g., a bone marrow aspiration, a tissue biopsy).In some embodiments, the biopsy is performed by fine needle aspirationor by surgical excision. In some embodiments, the technique furtherincludes obtaining a biological sample (e.g., a tissue sample, a urinesample, a blood sample, a check swab, a saliva sample, a mucosal sample(e.g., sputum, bronchial secretion), a nipple aspirate, a secretion oran excretion). In some embodiments, the technique includes determiningexosomal proteins (e.g., an exosomal surface protein (e.g., CD24, CD147,PCA-3)) (Soung et al. (2017) Cancers 9(1):pii:E8). In some embodiments,the diagnostic testing method is an oncotype DX® test (Baehner (2016)Ecancermedicalscience 10:675).

In some embodiments, cancers of organs other than pancreatic cancer maybe detected according to any of the variety of methods described herein.

In some embodiments, methods provided herein (e.g., methods in which thepresence of one or more genetic biomarkers in cell-free DNA (e.g.,ctDNA) and the presence of one or more high threshold protein biomarkersare detected in a biological sample isolated from the subject) can beused for selecting a treatment for a subject. For example, once asubject has been determined to have cancer (e.g., pancreatic cancer) byany of the variety of methods disclosed herein, an appropriate treatmentcan be selected (e.g., any of the variety of therapeutic interventionsdescribed herein). In some embodiments, methods provided herein (e.g.,methods in which the presence of one or more genetic biomarkers incell-free DNA (e.g., ctDNA) and the presence of one or more highthreshold protein biomarkers are detected in a biological sampleisolated from the subject) can be used for selecting a subject fortreatment. For example, once a subject has been determined to havecancer (e.g., pancreatic cancer) by any of the variety of methodsdisclosed herein, that subject can be identified as an appropriatesubject to receive a treatment (e.g., any of the variety of therapeuticinterventions described herein). In some embodiments, methods providedherein (e.g., methods in which the presence of one or more geneticbiomarkers in cell-free DNA (e.g., ctDNA) and the presence of one ormore high threshold protein biomarkers are detected in a biologicalsample isolated from the subject) can be used for selecting a subjectfor increased monitoring. For example, once a subject has beendetermined to have cancer (e.g., pancreatic cancer) by any of thevariety of methods disclosed herein, that subject can be identified asan appropriate subject to receive increased monitoring (e.g., any of thevariety of monitoring techniques described herein). In some embodiments,methods provided herein (e.g., methods in which the presence of one ormore genetic biomarkers in cell-free DNA (e.g., ctDNA) and the presenceof one or more high threshold protein biomarkers are detected in abiological sample isolated from the subject) can be used for selecting asubject for further diagnostic testing. For example, once a subject hasbeen determined to have cancer (e.g., pancreatic cancer) by any of thevariety of methods disclosed herein, that subject can be identified asan appropriate subject to receive further diagnostic testing (e.g., anyof the variety of diagnostic techniques described herein).

In some embodiments, methods provided herein can be used to detect thepresence of cancer (e.g., pancreatic cancer) at a time period prior todiagnosis of the subject with an early-stage cancer and/or at a timeprior to the subject exhibiting symptoms associated with cancer. Forexample, methods provided herein can be used when a subject has not beendiagnosed with cancer and/or when a subject is not known to harbor acancer cell.

In some embodiments of any of the methods described herein, the subjectcan be administered a single or multiple doses (e.g., two, three, four,five, six, seven, eight, nine, or ten doses) of any of the therapeuticinterventions described herein.

In some embodiments, assays for genetic biomarkers (e.g., geneticalterations) can be combined with assays for elevated protein biomarkersto increase the sensitivity of a blood test for low stage pancreaticcancers. In some embodiments, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%,90%, 95% or more of such cancers can be detected through thiscombination test, including some patients with a favorable prognosis. Insome embodiments, 64% of such cancers can be detected through thiscombination test, including some patients with a favorable prognosis.One of the design features of certain studies presented herein was thatonly patients with resectable pancreatic cancers were included, andpatients with advanced disease (i.e., Stage III or IV) were excluded.Though this exclusion reduced the sensitivity that could be otherwise beachieved by evaluating all pancreatic cancer patients, regardless ofstage, the resectable cases are represent a promising group withadvantageous clinical relevance with respect to evaluating a screeningtechnology. In some embodiments, methods provided herein can be used todetect all pancreatic cancers in subjects (e.g., human subjects).

Whether combining genetic biomarkers present in ctDNA and protein markerbiomarkers could increase sensitivity over either alone was not knownprior to the present disclosure. In fact, it was conceivable that thesame patients with detectable circulating protein biomarkers wouldlargely overlap those releasing DNA into the circulation. This was ofparticular concern for early stage cancer patients, because both ctDNAand protein-based biomarkers are known to be considerably higher inpatients with advanced cancers compared to those with earlier stagecancers (Lennon A M & Goggins M (2010) Diagnostic and TherapeuticResponse Markers. Pancreatic Cancer, (Springer New York, N.Y., N.Y.), pp675-701; Locker G Y, et al. (2006) ASCO 2006 update of recommendationsfor the use of tumor markers in gastrointestinal cancer. J Clin Oncol24(33):5313-5327; Bettegowda C, et al. (2014) Detection of circulatingtumor DNA in early- and late-stage human malignancies. Sciencetranslational medicine 6(224):224ra224).

In some embodiments of the methods provided herein, very highspecificity (e.g., 99.5%: 95% CI 97-100%) can be achieved For example,only one false positive among 182 healthy individuals of average age 64was observed in the studies presented herein. Given the relativeinfrequency of cancer in the general population, the specificity of anypotentially useful blood-based screening test for pancreatic cancer ispreferably high, e.g., preferably >99%. Otherwise, the number of falsepositives would greatly exceed the number of true positives (i.e., havesuboptimal positive predictive value) (Lennon A M, et al. (2014) TheEarly Detection of Pancreatic Cancer: What Will It Take to Diagnose andTreat Curable Pancreatic Neoplasia? Cancer Res 74(13):3381-3389). Suchstringency for screening tests is not typically required for tests tomonitor disease in patients with known cancer. For monitoring,specificity can be relaxed somewhat in the interest of obtaining highersensitivity. High specificity was achieved with methods disclosed hereinin at least two ways. First, ctDNA was used as one of the components ofthe test. KRAS mutations are exquisitely specific for neoplasia andtheir specificity has traditionally been limited by technical ratherthan biological factors. The incorporation of molecular barcoding intovarious assays described herein (e.g., using a Safe-SeqS technique) canminimize the false positive results from sequencing that havetraditionally been major technical issues confronting any ctDNA-basedassays. KRAS mutations are particularly suitable for early detectionstrategies because they are rarely found in clones arising duringage-associated clonal hematopoiesis. Such clones, which may representearly forms of myelodysplasia, are a potential source of false positivectDNA assays. The vast majority of such mutations occur within ninegenes (DNMT3A, TET2, JAK2, ASXL1, TP53, GNAS, PPM1D, BCORL1 and SF3B1)(48-50), posing challenges for the use of these genes as biomarkers inctDNA-based assays. Second, high thresholds were used for scoring theprotein biomarkers as positive. These thresholds were based on priorstudies in the literature or on an independent set of controls,permitting avoidance of positive scores in the vast majority of healthypatients (Kim J E, et al. (2004) Clinical usefulness of carbohydrateantigen 19-9 as a screening test for pancreatic cancer in anasymptomatic population. J Gastroenterol Hepatol 19(2):182-186). In someembodiments, such high thresholds can be used without an overallreduction in sensitivity because the ctDNA assay added sensitivity onits own and the ctDNA-positive cases only partially overlapped theprotein-biomarker-positive cases (See, e.g., FIG. 9, FIG. 10, andExample 2).

Protein biomarkers have been combined with each other in the past toachieve higher sensitivity (Dong T, Liu C C, Petricoin E F, & Tang L L(2014) Combining markers with and without the limit of detection. StatMed 33(8):1307-1320). For example, it was shown that combining CA19-9and TIMP-1 was more sensitive for the detection of PDAC than eitherbiomarker alone (Zhou W, et al. (1998) Identifying markers forpancreatic cancer by gene expression analysis. Cancer EpidemiolBiomarkers Prev 7(2):109-112). More recently, it was shown that thecombination of CA19-9, TIMP-1, and LRG-1 was more sensitive for thedetection of early PDAC than CA19-9 alone (Capello M, et al. (2017)Sequential Validation of Blood-Based Protein Biomarker Candidates forEarly-Stage Pancreatic Cancer. J Natl Cancer Inst 109(4)). Thecombination of protein biomarkers with ultrasensitive ctDNA, asdisclosed herein, is different. A recent study evaluated a combinationof ctDNA and CA19-9 for pancreatic cancer but found no benefit tocombining the biomarkers over CA19-9 alone. Without being bound bytheory, it is possible this conclusion was reached due to inadequatesensitivity of the test used in detecting KRAS mutations (Le Calvez-KelmF, et al. (2016) KRAS mutations in blood circulating cell-free DNA: apancreatic cancer case-control. Oncotarget 7(48):78827-78840).Furthermore, the specificity for ctDNA achieved in that study wasrelatively low, reducing its suitability for screening.

In some embodiments, methods provided herein can be used to detectresectable pancreatic cancers through a non-invasive blood test in amajority of patients.

In some embodiments, results obtained using any of the variety ofmethods disclosed herein can underestimate the survival benefits ofearly detection. The majority of the patients that were studied herein,even though they had resectable cancers, were symptomatic and theircancers were discovered only by virtue of their symptoms. Accordingly,77% of patients in the cohort described herein were Stage IIB and themedian size of tumors in these patients was 3 cm. In some embodiments,in a screening study of asymptomatic individuals, a greater proportionof earlier stage patients, with smaller tumors, can be discovered usingany of the variety of methods disclosed herein. In some embodiments, anyof the variety of methods disclosed herein can be more sensitive for thedetection of patients with larger tumors and with a poorer prognosisthan for patients with smaller tumors, even though all tumors can besurgically resectable (See, e.g., FIG. 9B and Example 2). In someembodiments, KRAS mutations can be found in the circulation of patientswith cancer types other than those of the pancreas, primarily those ofthe lung (Herbst R S, Heymach J V, & Lippman S M (2008) Lung cancer. NEngl J Med 359(13):1367-1380), and CA19-9, CEA, HGF, and OPN expressioncan be elevated in several other cancer types (Kim J E, et al. (2004)Clinical usefulness of carbohydrate antigen 19-9 as a screening test forpancreatic cancer in an asymptomatic population. J Gastroenterol Hepatol19(2):182-186; Thomas D S, et al. (2015) Evaluation of serum CEA,CYFRA21-1 and CA125 for the early detection of colorectal cancer usinglongitudinal preclinical samples. Br J Cancer 113(2):268-274; Di Renzo MF, et al. (1995) Overexpression and amplification of the met/HGFreceptor gene during the progression of colorectal cancer. Clin CancerRes 1(2):147-154; E1-Tanani M K, et al. (2006) The regulation and roleof osteopontin in malignant transformation and cancer. Cytokine GrowthFactor Rev 17(6):463-474). Thus, in some embodiments, patients testingpositive using any of the variety of methods disclosed herein canundergo additional appropriate imaging studies to identify tumorlocalization.

In some embodiments, methods provided herein lay a foundation forevaluation of patients at high risk for PDAC, and for implementation ofearly detection strategies (Kalinich M, et al. (2017) An RNA-basedsignature enables high specificity detection of circulating tumor cellsin hepatocellular carcinoma. Proc Natl Acad Sci USA 114(5):1123-1128).As an example, new-onset diabetes is known to be associated with anincreased risk for pancreatic cancer. Approximately 1% of diabeticpatients aged 50 and older are diagnosed with pancreatic cancer within 3years of first meeting criteria for diabetes (Chari S T, et al. (2005)Probability of pancreatic cancer following diabetes: a population-basedstudy. Gastroenterology 129(2):504-511). With an incidence of 1%, thePPV/NPV of certain combination assays disclosed herein is expected to be54% and 99.6%, respectively, in this population, which is well withinthe range of currently approved screening tests for cancers.

Available evidence indicates that many cancers have detectable ctDNA intheir earliest stages, often more commonly than observed in pancreaticcancer (Bettegowda C, et al. (2014) Detection of circulating tumor DNAin early- and late-stage human malignancies. Science translationalmedicine 6(224):224ra224). Similarly, a large number of proteinbiomarkers have already been described for the detection of numerouscancer types (Liotta L A & Petricoin E F, 3rd (2003) The promise ofproteomics. Clin Adv Hematol Oncol 1(8):460-462). These proteinbiomarkers can be thresholded according to any of the variety of methodsdescribed herein, permitting the use ctDNA-protein combinations todetect a variety of cancer types (Bettegowda C, et al. (2014) Detectionof circulating tumor DNA in early- and late-stage human malignancies.Science translational medicine 6(224):224ra224).

Genetic Biomarkers in Combination with Aneuploidy

In one aspect, provided herein are methods and materials for detectingthe presence of one or more members of a panel of genetic biomarkers andthe presence of aneuploidy in one or more samples obtained from asubject. In another aspect, provided herein are methods and materialsfor diagnosing or identifying the presence of a disease in a subject(e.g., identifying the subject as having cancer) by detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy in one or more samples obtained from the subject.In another aspect, provided herein are methods and materials foridentifying a subject as being at risk (e.g., increased risk) of havingor developing a disease (e.g., cancer) by detecting the presence of oneor more members of a panel of genetic biomarkers and the presence ofaneuploidy in one or more samples obtained from the subject. In anotheraspect, provided herein are methods and materials for treating a subjectwho has been diagnosed or identified as having a disease (e.g., cancer)or who has been identified as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) by detecting the presenceof one or more members of a panel of genetic biomarkers and the presenceof aneuploidy in one or more samples obtained from the subject. Inanother aspect, provided herein are methods and materials foridentifying a treatment for a subject who has been diagnosed oridentified as having a disease (e.g., cancer) or who has been identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) by detecting the presence of one or more membersof a panel of genetic biomarkers and the presence of aneuploidy in oneor more samples obtained from the subject. In another aspect, providedherein are methods and materials for identifying a subject who will oris likely to respond to a treatment by detecting the presence of one ormore members of a panel of genetic biomarkers and the presence ofaneuploidy in one or more samples obtained from the subject. In anotheraspect, provided herein are methods and materials for identifying asubject as a candidate for further diagnostic testing by detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy in one or more samples obtained from the subject.In another aspect, provided herein are methods and materials foridentifying a subject as a candidate for increased monitoring bydetecting the presence of one or more members of a panel of geneticbiomarkers and the presence of aneuploidy in one or more samplesobtained from the subject.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectprovide high sensitivity in the detection or diagnosis of cancer (e.g.,a high frequency or incidence of correctly identifying a subject ashaving cancer). In some embodiments, methods provided herein thatinclude detecting the presence of one or more members of a panel ofgenetic biomarkers and the presence of aneuploidy in one or more samplesobtained from a subject provide a sensitivity in the detection ordiagnosis of cancer (e.g., a high frequency or incidence of correctlyidentifying a subject as having cancer) that is higher than thesensitivity provided by separately detecting the presence of one or moremembers of a panel of genetic biomarkers or the presence of aneuploidy.In some embodiments, methods and materials provided herein that includedetecting the presence of one or more members of a panel of geneticbiomarkers and the presence of aneuploidy in one or more samplesobtained from a subject provide a sensitivity of at least about 70%, atleast about 75%, at least about 80%, at least about 85%, at least about90%, at least about 91%, at least about 92%, at least about 93%, atleast about 94%, at least about 95%, at least about 96%, at least about97%, at least about 98%, at least about 99%, or higher. In someembodiments, methods and materials provided herein that includedetecting the presence of one or more members of a panel of geneticbiomarkers and the presence of aneuploidy in one or more samplesobtained from a subject provide high sensitivity in detecting a singletype of cancer. In some embodiments, methods and materials providedherein that include detecting the presence of one or more members of apanel of genetic biomarkers and the presence of aneuploidy in one ormore samples obtained from a subject provide high sensitivity indetecting two or more types of cancers. Any of a variety of cancer typescan be detected using methods and materials provided herein (see, e.g.,the section entitled “Cancers”). In some embodiments, cancers that canbe detected using methods and materials that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectinclude pancreatic cancer. In some embodiments, cancers that can bedetected using methods and materials that include detecting the presenceof one or more members of a panel of genetic biomarkers and the presenceof aneuploidy in one or more samples obtained from a subject includeliver cancer, ovarian cancer, esophageal cancer, stomach cancer,pancreatic cancer, colorectal cancer, lung cancer, or breast cancer. Insome embodiments, cancers that can be detected using methods andmaterials that include detecting the presence of one or more members ofa panel of genetic biomarkers and the presence of aneuploidy in one ormore samples obtained from a subject include cancers of the femalereproductive tract (e.g., cervical cancer, endometrial cancer, ovariancancer, or fallopian tubal cancer). In some embodiments, cancers thatcan be detected using methods and materials that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectinclude bladder cancer or upper-tract urothelial carcinomas.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectprovide high specificity in the detection or diagnosis of cancer (e.g.,a low frequency or incidence of incorrectly identifying a subject ashaving cancer when that subject does not have cancer). In someembodiments, methods provided herein that include detecting the presenceof one or more members of a panel of genetic biomarkers and the presenceof aneuploidy in one or more samples obtained from a subject provide aspecificity in the detection or diagnosis of cancer (e.g., a highfrequency or incidence of correctly identifying a subject as havingcancer) that is higher than the specificity provided by separatelydetecting the presence of one or more members of a panel of geneticbiomarkers or the presence of aneuploidy. In some embodiments, methodsand materials provided herein that include that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectprovide a specificity of at least about 70%, at least about 75%, atleast about 80%, at least about 85%, at least about 90%, at least about91%, at least about 92%, at least about 93%, at least about 94%, atleast about 95%, at least about 96%, at least about 97%, at least about98%, at least about 99%, or higher. As will be understood by those ofordinary skill in the art, a specificity of 99% means that only 1% ofsubjects that do not have cancer are incorrectly identified as havingcancer. In some embodiments, methods and materials provided herein thatinclude detecting the presence of one or more members of a panel ofgenetic biomarkers and the presence of aneuploidy in one or more samplesobtained from a subject provide high specificity in detecting a singlecancer (e.g., there is a low probability of incorrectly identifying thatsubject as having that single cancer type). In some embodiments, methodsand materials provided herein that include detecting the presence of oneor more members of a panel of genetic biomarkers and the presence ofaneuploidy in one or more samples obtained from a subject provide highspecificity in detecting two or more cancers (e.g., there is a lowprobability of incorrectly identifying that subject as having those twoor more cancer types).

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectinclude detecting the presence of: 1) one or more genetic biomarkers inone or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, or 18) of the following genes: NRAS, PTEN, FGFR2, KRAS, POLE,AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC,EGFR, BRAF, and/or CDKN2A, and 2) the presence of aneuploidy. In someembodiments, methods provided herein that include detecting the presenceof one or more members of a panel of genetic biomarkers and the presenceof aneuploidy in one or more samples obtained from a subject includedetecting the presence of: 1) one or more genetic biomarkers in each ofthe following genes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, andCDKN2A, and 2) the presence of aneuploidy. In some embodiments ofmethods provided herein that include detecting the presence of one ormore members of a panel of genetic biomarkers and the presence ofaneuploidy in one or more samples obtained from a subject includedetecting the presence of: 1) one or more one or more genetic biomarkersin one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, or 18) of the following genes: NRAS, PTEN, FGFR2, KRAS, POLE,AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC,EGFR, BRAF, and/or CDKN2A, and 2) the presence of aneuploidy, thesubject is determined as having (e.g., diagnosed to have) or isdetermined to be (e.g. diagnosed as being) at elevated risk of having ordeveloping one of the following types of cancer: cervical cancer,endometrial cancer, ovarian cancer, or fallopian tubal cancer.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of one or more members of a panel of protein biomarkers in oneor more samples obtained from a subject include detecting the presenceof: 1) one or more genetic biomarkers in one or more (e.g., 1, 2, 3, 4,5, 6, 7, 8, 9, or 10) of the following genes: TP53, PIK3CA, FGFR3, KRAS,ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL, 2) the presence of a TERTpromoter mutation (e.g., a genetic biomarker in a TERT promoter), and 3)the presence of aneuploidy. In some embodiments, methods provided hereinthat include detecting the presence of one or more members of a panel ofgenetic biomarkers and the presence of aneuploidy in one or more samplesobtained from a subject include detecting the presence of: 1) one ormore genetic biomarkers in each of the following genes: TP53, PIK3CA,FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and VHL, 2) the presence ofa TERT promoter mutation (e.g., a genetic biomarker in a TERT promoter),and 3) the presence of aneuploidy. In some embodiments of methodsprovided herein that include detecting the presence of one or moremembers of a panel of genetic biomarkers and the presence of aneuploidyin one or more samples obtained from a subject include detecting thepresence of: 1) one or more one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) of the following genes:TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL, 2)the presence of a TERT promoter mutation (e.g., a genetic biomarker in aTERT promoter), and 3) the presence of aneuploidy, the subject isdetermined as having (e.g., diagnosed to have) or is determined to be(e.g. diagnosed as being) at elevated risk of having or developingcancer one of the following types of cancer: bladder cancer or anupper-tract urothelial carcinoma.

A sample obtained from a subject can be any of the variety of samplesdescribed herein that contains DNA (e.g., ctDNA in the blood, or DNApresent in bladder, cervical, endometrial, or uterine samples) and/orproteins. In some embodiments, DNA (e.g., cell-free DNA (e.g., ctDNA) orDNA present in bladder, cervical, endometrial, or uterine samples)and/or proteins in a sample obtained from the subject are derived from atumor cell. In some embodiments, DNA (e.g., cell-free DNA (e.g., ctDNA)in a sample obtained from the subject includes one or more geneticbiomarkers and or aneuploid DNA. In some embodiments, proteins in asample obtained from the subject includes one or more proteinbiomarkers. Non-limiting examples of samples in which genetic biomarkersand/or protein biomarkers and/or aneuploidy can be detected include ablood sample, a plasma sample, a serum sample, a urine sample, anendometrial sample, a cervical sample, and a uterine sample. In someembodiments, the presence of one or more genetic biomarkers and thepresence of aneuploidy in a single sample obtained from the subject. Insome embodiments, the presence of one or more genetic biomarkers isdetected in a first sample obtained from a subject, and the presence ofaneuploidy is detected in a second sample obtained from the subject.

In some embodiments, when a subject is determined as having (e.g.,diagnosed to have) cancer or determined to be (e.g. diagnosed as being)at elevated risk of having or developing cancer (e.g., by detecting thepresence of: 1) one or more genetic biomarkers in one or more (e.g., 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18) of thefollowing genes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/orCDKN2A, and 2) the presence of aneuploidy), the subject is selected as acandidate for (e.g., is selected for) further diagnostic testing (e.g.,any of the variety of further diagnostic testing methods describedherein), the subject is selected as a candidate for (e.g. is selectedfor) increased monitoring (e.g., any of the variety of increasingmonitoring methods described herein), the subject is identified as asubject who will or is likely to respond to a treatment (e.g., any ofthe variety of therapeutic interventions described herein), the subjectis selected as a candidate for (e.g., is selected for) a treatment, atreatment (e.g., any of the variety of therapeutic interventionsdescribed herein) is selected for the subject, and/or a treatment (e.g.,any of the variety of therapeutic interventions described herein) isadministered to the subject. In some embodiments, when a subject isdetermined as having (e.g., diagnosed to have) cancer or determined tobe (e.g. diagnosed as being) at elevated risk of having or developingcancer (e.g., by detecting the presence of: 1) one or more geneticbiomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) ofthe following genes: TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL,HRAS, MET, and/or VHL, 2) the presence of a TERT promoter mutation(e.g., a genetic biomarker in a TERT promoter), and 3) the presence ofaneuploidy), the subject is selected as a candidate for (e.g., isselected for) further diagnostic testing (e.g., any of the variety offurther diagnostic testing methods described herein), the subject isselected as a candidate for (e.g. is selected for) increased monitoring(e.g., any of the variety of increasing monitoring methods describedherein), the subject is identified as a subject who will or is likely torespond to a treatment (e.g., any of the variety of therapeuticinterventions described herein), the subject is selected as a candidatefor (e.g., is selected for) a treatment, a treatment (e.g., any of thevariety of therapeutic interventions described herein) is selected forthe subject, and/or a treatment (e.g., any of the variety of therapeuticinterventions described herein) is administered to the subject. Forexample, when a subject is determined as having (e.g., diagnosed tohave) cancer or determined to be (e.g. diagnosed as being) at elevatedrisk of having or developing cancer, the subject can undergo furtherdiagnostic testing, which further diagnostic testing can confirm thepresence of cancer in the subject. Additionally or alternatively, thesubject can be monitored at in increased frequency. In some embodimentsof a subject determined as having (e.g., diagnosed to have) cancer ordetermined to be (e.g. diagnosed as being) at elevated risk of having ordeveloping cancer in which the subject undergoes further diagnostictesting and/or increased monitoring, the subject can additionally beadministered a therapeutic intervention. In some embodiments, after asubject is administered a therapeutic intervention, the subjectundergoes additional further diagnostic testing (e.g., the same type offurther diagnostic testing as was performed previously and/or adifferent type of further diagnostic testing) and/or continued increasedmonitoring (e.g., increased monitoring at the same or at a differentfrequency as was previously done). In embodiments, after a subject isadministered a therapeutic intervention and the subject undergoesadditional further diagnostic testing and/or additional increasedmonitoring, the subject is administered another therapeutic intervention(e.g., the same therapeutic intervention as was previously administeredand/or a different therapeutic intervention). In some embodiments, aftera subject is administered a therapeutic intervention, the subject istested for the presence of: 1) one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,or 18) of the following genes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1,TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR,BRAF, and/or CDKN2A, and 2) the presence of aneuploidy. In someembodiments, after a subject is administered a therapeutic intervention,the subject is tested for the presence of: 1) one or more geneticbiomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) ofthe following genes: TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL,HRAS, MET, and/or VHL, 2) the presence of a TERT promoter mutation(e.g., a genetic biomarker in a TERT promoter), and 3) the presence ofaneuploidy.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy further include detecting the presence one ormore members of a panel of protein biomarkers in one or more samplesobtained from a subject (e.g., the same sample use to detect either orboth of the presence of one or more members of a panel of geneticbiomarkers and the presence of aneuploidy, or a different sample). Anyof a variety of protein biomarkers can be detected (e.g., any of thevariety of protein biomarkers and/or protein biomarker panels describedherein).

In some embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore genetic biomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, or 18) of the following genes: NRAS,PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1,PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/or CDKN2A, and 2) thepresence of aneuploidy, the methods further include detecting thepresence of one or more members of a panel of protein biomarkers in asample obtained from the subject (e.g., the same sample use to detecteither or both of the presence of one or more members of a panel ofgenetic biomarkers and the presence of aneuploidy, or a differentsample). In some embodiments of methods provided herein that includedetecting in one or more samples obtained from a subject the presenceof: 1) one or more genetic biomarkers in each of the following genes:NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1,CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and CDKN2A, and 2) thepresence of aneuploidy, the methods further include detecting thepresence of one or more members of a panel of protein biomarkers in asample obtained from the subject (e.g., the same sample use to detecteither or both of the presence of one or more members of a panel ofgenetic biomarkers and the presence of aneuploidy, or a differentsample). In some embodiments of methods provided herein that includedetecting in one or more samples obtained from a subject the presenceof: 1) one or more genetic biomarkers in one or more (e.g., 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18) of the followinggenes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1,CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/or CDKN2A, 2) thepresence of aneuploidy, and 3) the presence of one or more members of apanel of protein biomarkers, the presence of one or more members of, thesubject is determined as having (e.g., diagnosed to have) or isdetermined to be (e.g. diagnosed as being) at elevated risk of having ordeveloping one of the following types of cancer: cervical cancer,endometrial cancer, ovarian cancer, or fallopian tubal cancer.

In some embodiments or methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore genetic biomarkers in one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,or 10) of the following genes: TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A,MLL, HRAS, MET, and/or VHL, 2) the presence of a TERT promoter mutation(e.g., a genetic biomarker in a TERT promoter), and 3) the presence ofaneuploidy, the methods further include detecting the presence of one ormore members of a panel of protein biomarkers in a sample obtained fromthe subject (e.g., the same sample use to detect either or both of thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy, or a different sample). In some embodiments ofmethods provided herein that include detecting in one or more samplesobtained from a subject the presence of: 1) one or more geneticbiomarkers in each of the following genes: TP53, PIK3CA, FGFR3, KRAS,ERBB2, CDKN2A, MLL, HRAS, MET, and VHL, 2) the presence of a TERTpromoter mutation (e.g., a genetic biomarker in a TERT promoter), and 3)the presence of aneuploidy, the methods further include detecting thepresence of one or more members of a panel of protein biomarkers in asample obtained from the subject (e.g., the same sample use to detecteither or both of the presence of one or more members of a panel ofgenetic biomarkers and the presence of aneuploidy, or a differentsample). In some embodiments of methods provided herein that includedetecting in one or more samples obtained from a subject the presenceof: 1) one or more genetic biomarkers in one or more (e.g., 1, 2, 3, 4,5, 6, 7, 8, 9, or 10) of the following genes: TP53, PIK3CA, FGFR3, KRAS,ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL, 2) the presence of a TERTpromoter mutation (e.g., a genetic biomarker in a TERT promoter), 3) thepresence of aneuploidy, and 4) the presence of one or more members of apanel of protein biomarkers, the subject is determined as having (e.g.,diagnosed to have) or is determined to be (e.g. diagnosed as being) atelevated risk of having or developing cancer one of the following typesof cancer: bladder cancer or an upper-tract urothelial carcinoma.

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of genetic biomarkers andthe presence of aneuploidy, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, or8) of the following protein biomarkers can further be detected: CA19-9,CEA, HGF, OPN, CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO).In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of genetic biomarkers andthe presence of aneuploidy, each of the following protein biomarkers canfurther be detected: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1,and myeloperoxidase (MPO).

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of genetic biomarkers andthe presence of aneuploidy, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8,9, 10, or 11) of the following protein biomarkers can further bedetected: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1,follistatin, G-CSF, and/or CA15-3. In some embodiments of methodsprovided herein that include detecting the presence of one or moremembers of a panel of genetic biomarkers and the presence of aneuploidy,each of the following protein biomarkers can further be detected:CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin,G-CSF, and CA15-3.

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of genetic biomarkers andthe presence of aneuploidy, one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8,or 9) of the following protein biomarkers can further be detected:CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and/or CA15-3. Insome embodiments of methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers and thepresence of aneuploidy, each of the following protein biomarkers canfurther be detected: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin,TIMP-1, and CA15-3.

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of genetic biomarkers andthe presence of aneuploidy, one or more (e.g., 1, 2, 3, or 4) of thefollowing protein biomarkers can further be detected: CA19-9, CEA, HGF,and/or OPN. In some embodiments of methods provided herein that includedetecting the presence of one or more members of a panel of geneticbiomarkers and the presence of aneuploidy, each of the following proteinbiomarkers can further be detected: CA19-9, CEA, HGF, and OPN.

In some embodiments, any of the variety of methods provided herein thatinclude detecting the presence of one or more members of a panel ofgenetic biomarkers and the presence of aneuploidy in one or more samplesobtained from a subject further include detecting the presence of one ormore members of one or more additional classes of biomarkers.Non-limiting examples of such additional classes of biomarkers includes:copy number changes, DNA methylation changes, other nucleic acids (e.g.,mRNAs, miRNAs, lncRNAs, circRNA, mtDNA, telomeric DNA, translocation andgenomic rearrangements), peptides, and/or metabolites.

In some embodiments, the one or more additional classes of biomarkersinclude a metabolite biomarker. In some embodiments, a subject isdetermined to be at elevated risk of having or developing cancer if thebiological sample contains one or more metabolites indicative of cancer.In some embodiments, a subject is determined as having cancer if thebiological sample contains one or more metabolites indicative of cancer.Non-limiting examples of metabolites indicative of cancer include:5-methylthioadenosine (MTA), Glutathione reduced (GSH),N-acetylglutamate, Lactose, N-acetylneuraminate, UDP-acetylglucosamine,UDP-Acetylgalactosamine, UDP-glucuronate, Pantothenate, Arachidonate(20:4n6), Choline, Cytidine 5′-diphosphocholine, Dihomo-linolenate(20:3n3), Docosapentaenoate (DPA 22:5n3), Eicosapentaenoate (EPA20:5n3), Glycerophosphorylcholine (GPC), Docosahexaenoate (DHA 22:6n3),Linoleate (18:2n6), Cytidine 5′-monophosphate (5′-CMP),Gamma-glutamylglutamate, X-14577, X-11583, Isovalerylcarnitine,Phosphocreatine, 2-Aminoadipic acid, Gluconic acid, 0-Acetylcarnitine,aspartic acid, Deamido-NAD+, glutamic acid, Isobutyrylcarnitine,Carnitine, Pyridoxal, Citric acid, Adenosine, ATP, valine, XC0061,Isoleucine, γ-Butyrobetaine, Lactic acid, alanine, phenylalanine,Gluconolactone, leucine, Glutathione (GSSG) divalent, tyrosine, NAD+,XC0016, UTP, creatine, Theobromine, CTP, GTP, 3-Methylhistidine,Succinic acid, Glycerol 3-phosphate, glutamine, 5-Oxoproline, Thiamine,Butyrylcarnitine, 4-Acetamidobutanoic acid, UDP-Glucose, UDP-Galactose,threonine, N-Acetylglycine, proline, ADP, Choline, Malic acid,S-Adenosylmethionine, Pantothenic acid, Cysteinesulfinic acid,6-Aminohexanoic acid, Homocysteic acid, Hydroxyproline, Methioninesulfoxide, 3-Guanidinopropionic acid, Glucose 6-phosphate, Phenaceturicacid, Threonic acid, tryptophan, Pyridoxine, N-Acetylaspartic acid,4-Guanidinobutyric acid, serine, Citrulline, Betaine,N-Acetylasparagine, 2-Hydroxyglutaric acid, arginine, Glutathione (GSH),creatinine, Dihydroxyacetone phosphate, histidine, glycine, Glucose1-phosphate, N-Formylglycine, Ketoprofen, lysine, beta-alanine,N-Acetylglutamic acid, 2-Amino-2-(hydroxymethyl)-1,3-propanediol,Ornithine, Phosphorylcholine, Glycerophosphocholine, Terephthalic acid,Glyceraldehyde 3-phosphate, Gly-Asp, Taurine, Fructose 1,6-diphosphate,3-Aminoisobutyric acid, Spermidine, GABA, Triethanolamine, Glycerol,N-Acetylserine, N-Acetylornithine, Diethanolamine, AMP, Cysteineglutathione disulfide, Streptomycin sulfate+H2O divalent,trans-Glutaconic acid, Nicotinic acid, Isobutylamine, Betainealdehyde+H2O, Urocanic acid, 1-Aminocyclopropane-1-carboxylic acidHomoserinelactone, 5-Aminovaleric acid, 3-Hydroxybutyric acid,Ethanolamine, Isovaleric acid, N-Methylglutamic acid, Cystathionine,Spermine, Carnosine, 1-Methylnicotinamide, N-Acetylneuraminic acid,Sarcosine, GDP, N-Methylalanine, palmitic acid,1,2-dioleoyl-sn-glycero-3-phospho-rac-glycerolcholesterol 5α,6αepoxidelanosterol, lignoceric acid, 1oleoyl_rac_GL, cholesterol_epoxide,erucic acid, T-LCA, oleoyl-L-carnitine, oleanolic acid,3-phosphoglycerate, 5-hydroxynorvaline, 5-methoxytryptamine,adenosine-5-monophosphate, alpha-ketoglutarate, asparagine, benzoicacid, hypoxanthine, maltose, maltotriose, methionine sulfoxide,nornicotine, phenol, Phosphoethanolamine, pyrophosphate, pyruvic acid,quinic acid, taurine, uric acid, inosine, lactamide, 5-hydroxynorvalineNIST, cholesterol, deoxypentitol, 2-hydroxyestrone, 2-hydroxyestradiol,2-metholyestrone, 2-metholxyestradiol, 2-hydroxyestrone-3-methyl ether,4-hydroxyestrone, 4-metholxyestrone, 4-methoxyestradiol,16alpha-hydroxyestrone, 17-epiestriol, estriol, 16-Ketoestradiol,16-epiestriol, acylcarnitine C18:1, amino acids citrulline andtrans-4-hydroxyproline, glycerophospholipids PC aa C28:1, PC ae C30:0and PC ae C30:2, and sphingolipid SM (OH) C14:1. See e.g., Halama etal., Nesting of colon and ovarian cancer cells in the endothelial nicheis associated with alterations in glycan and lipid metabolism,Scientific Reports volume 7, Article number: 39999 (2017); Hur et al.,Systems approach to characterize the metabolism of liver cancer stemcells expressing CD133, Sci Rep., 7: 45557, doi: 10.1038/srep45557,(2017); Eliassen et al., Urinary Estrogens and Estrogen Metabolites andSubsequent Risk of Breast Cancer among Premenopausal Women, Cancer Res;72(3); 696-706 (2011); Gangi et al., Metabolomic profile in pancreaticcancer patients: a consensus-based approach to identify highlydiscriminating metabolites, Oncotarget, February 2; 7(5): 5815-5829(2016); Kumar et al., Serum and Plasma Metabolomic Biomarkers for LungCancer, Bioinformation, 13(6): 202-208, doi: 10.6026/97320630013202(2017); Schmidt et al., Pre-diagnostic metabolite concentrations andprostate cancer risk in 1077 cases and 1077 matched controls in theEuropean Prospective Investigation into Cancer and Nutrition, BMC Med.,15: 122, doi: 10.1186/s12916-017-0885-6 (2017); each of which isincorporated herein by reference in its entirety.

In some embodiments, the one or more additional classes of biomarkersinclude a peptide (e.g., a peptide that is distinct from the variousprotein biomarkers described herein as being useful in one or moremethods). In some embodiments, a subject is determined to be at elevatedrisk of having or developing cancer if the biological sample containsone or more peptides indicative of cancer. In some embodiments, asubject is determined as having cancer if the biological sample containsone or more peptides indicative of cancer. In some embodiments, apeptide is derived from a protein (e.g., the peptide includes an aminoacid sequence present in a protein biomarker or a different protein).Non-limiting examples of peptides indicative of cancer include thefollowing peptides and peptides derived from the following proteins:CEACAM, CYFRA21-1, CA125, PKLK, ProGRP, NSE, TPA 6, TPA 7, TPA 8, NRG,NRG 100, CNDP, APOB100, SCC, VEGF, EGFR, PIK3CA, HER2, BRAF, ROS, RET,NRAS, MET, MEK1, HER2, C4.4A, PSF3, FAM83B, ECD, CTNNB, VIM, S100A4,S100A7, COX2, MUC1, KLKB1, SAA, HP-β chain, C9, Pgrmc1, Ciz1,Transferrin, α-1 antitrypsin, apolipo protein 1, complement c3a,Caveolin-1, Kallikrein 6, Glucose regulated protein-8, adefensing-1,-2,-3, Serum C-peptide, Alpha-2-HS glycol protein, TrypticKRT 8 peptide, Plasma glycol protein, Catenin, Defensin α 6, MMPs,Cyclin D, S100 P, Lamin A/C filament protein, Heat shock protein,aldehyde dehydrogenase, Tx1-2, (thioredoxin like protein-2), P53, nm23,u-PA, VEGF, Eph B4, CRABP2, WT-1, Rab-3D, Mesothelin, ERα, ANXA4, PSAT1,SPB5, CEA5, CEA6, AlAT, SLPI, APOA4, VDBP, HE4, IL-1, -6, -7, -8, -10,-11, -12, -16, -18, -21, -23, -28A, -33, LIF, TNFR1-2, HVEM (TNFRSF14),IL1R-a, IL1R-b, IL-2R, M-CSF, MIP-la, TNF-α, CD40, RANTES, CD40L, MIF,IFN-β, MCP-4 (CCL13), MIG (CXCL9), MIP-16 (CCL15), MIP3a (CCL20), MIP-4(CCL18), MPIF-1, SDF-1a+b (CXCL12), CD137/4-1BB, lymphotactin (XCL1),eotaxin-1 (CCL11), eotaxin-2 (CCL24), 6Ckine/CCL21), BLC (CXCL13), CTACK(CCL27), BCA-1 (CXCL13), HCC4 (CCL16), CTAP-3 (CXCL7), IGF1, VEGF,VEGFR3, EGFR, ErbB2, CTGF, PDGF AA, BB, PDGFRb, bFGF, TGFbRIII,β-cellulin, IGFBP1-4, 6, BDNF, PEDF, angiopoietin-2, renin,lysophosphatidic acid, β2-microglobulin, sialyl TN, ACE, CA 19-9, CEA,CA 15-3, CA-50, CA 72-4, OVX1, mesothelin, sialyl TN, MMP-2, -3, -7, -9,VAP-1, TIMP1-2, tenascin C, VCAM-1, osteopontin, KIM-1, NCAM,tetranectin, nidogen-2, cathepsin L, prostasin, matriptase, kallikreins2, 6, 10, cystatin C, claudin, spondin2, SLPI, bHCG, urinarygonadotropin peptide, inhibin, leptin, adiponectin, GH, TSH, ACTH, PRL,FSH, LH, cortisol, TTR, osteocalcin, insulin, ghrelin, GIP, GLP-1,amylin, glucagon, peptide YY, follistatin, hepcidin, CRP, Apo A1, CIII,H, transthyretin, SAA, SAP, complement C3,4, complement factor H,albumin, ceruloplasmin, haptoglobin, β-hemoglobin, transferrin,ferritin, fibrinogen, thrombin, von Willebrand factor, myoglobin,immunosuppressive acidic protein, lipid-associated sialic acid, S100A12(EN-RAGE), fetuin A, clusterin, al-antitrypsin, a2-macroglobulin,serpin1 (human plasminogen activator inhibitor-1), Cox-1, Hsp27, Hsp60,Hsp80, Hsp90, lectin-type oxidized LDL receptor 1, CD14, lipocalin 2,ITIH4, sFasL, Cyfra21-1, TPA, perforin, DcR3, AGRP, creatine kinase-MB,human milk fat globule 1-2, NT-Pro-BNP, neuron-specific enolase, CASA,NB/70K, AFP, afamin, collagen, prohibitin, keratin-6, PARC, B7-H4,YK-L40, AFP-L3, DCP, GPC3, OPN, GP73, CK19, MDK, A2, 5-HIAA, CA15-3,CA19-9, CA27.29, CA72-4, calcitonin, CGA, BRAF V600E, BAP, BCT-ABLfusion protein, KIT, KRAS, PSA, Lactate dehydrogenase, NMP22, PAI-1,uPA, fibrin D-dimer, 5100, TPA, thyroglobulin, CD20, CD24, CD44,RS/DJ-1, p53, alpha-2-HS-glycoprotein, lipophilin B, beta-globin,hemopexin, UBE2N, PSMB6, PPP1CB, CPT2, COPA, MSK1/2, Pro-NPY,Secernin-1, Vinculin, NAAA, PTK7, TFG, MCCC2, TRAP1, IMPDH2, PTEN,POSTN, EPLIN, eIF4A3, DDAH1, ARG2, PRDX3&4, P4HB, YWHAG, EnoylCoA-hydrase, PHB, TUBB, KRT2, DES, HSP71, ATPSB, CKB, HSPD1, LMNA, EZH2,AMACR, FABP5, PPA2, EZR, SLP2, SM22, Bax, Smac/Diablo phosphorylatedBcl2, STAT3 and Smac/Diablo expression, PHB, PAP, AMACR, PSMA, FKBP4,PRDX4, KRT7/8/18, GSTP1, NDPK1, MTX2, GDF15, PCa-24, Caveolin-2,Prothrombin, Antithrombin-III, Haptoglobin, Serum amyloid A-1 protein,ZAG, ORM2, APOC3, CALML5, IGFBP2, MUCSAC, PNLIP, PZP, TIMP1, AMBP,inter-alpha-trypsin inhibitor heavy chain H1, inter-alpha-trypsininhibitor heavy chain H2, inter-alpha-trypsin inhibitor heavy chain H3,V-type proton ATPase subunit B, kidney isoform, Hepatocyte growthfactor-like protein, Serum amyloid P-component, Acylglycerol kinase,Leucine-rich repeat-containing protein 9, Beta-2-glycoprotein 1, Plasmaprotease C1 inhibitor, Lipoxygenase homology domain-containing protein1, Protocadherin alpha-13. See, e.g., Kuppusamy et al., Volume 24, Issue6, September 2017, Pages 1212-1221; Elzek and Rodland, Cancer MetastasisRev. 2015 March; 34(1): 83-96; Noel and Lokshin, Future Oncol. 2012January; 8(1): 55-71; Tsuchiya et al., World J Gastroenterol. 2015 Oct.7; 21(37): 10573-10583; Lou et al., Biomark Cancer. 2017; 9: 1-9; Parket al., Oncotarget. 2017 Jun. 27; 8(26): 42761-42771; Saraswat et al.,Cancer Med. 2017 July; 6(7): 1738-1751; Zamay et al., Cancers (Basel).2017 November; 9(11): 155; Tanase et al., Oncotarget. 2017 Mar. 14;8(11): 18497-18512, each of which is incorporated herein by reference inits entirety.

In some embodiments, the one or more additional classes of biomarkersinclude nucleic acid lesions or variations (e.g., a nucleic acid lesionor variation that is distinct from the various genetic biomarkersdescribed herein as being useful in one or more methods). In someembodiments, a subject is determined to be at elevated risk of having ordeveloping cancer if the biological sample contains one or more nucleicacid lesions or variations indicative of cancer. In some embodiments, asubject is determined as having cancer if the biological sample containsone or more nucleic acid lesions or variations indicative of cancer.Non-limiting examples of nucleic acid lesions or variations include copynumber changes, DNA methylation changes, and/or other nucleic acids(e.g., mRNAs, miRNAs, lncRNAs, circRNA, mtDNA, telomeric DNA,translocation and genomic rearrangements). Translocations and genomicrearrangements have been correlated with various cancers (e.g.,prostate, glioma, lung cancer, non-small cell lung cancer, melanoma, andthyroid cancer) and used as biomarkers for years (e.g., Demeure et al.,2014, World J Surg., 38:1296-305; Hogenbirk et al., 2016, PNAS USA,113:E3649-56; Gasi et al., 2011, PLoS One, 6:e16332; Ogiwara et al.,2008, Oncogene, 27:4788-97; U.S. Pat. Nos. 9,745,632; and 6,576,420). Inaddition, changes in copy number have been used as biomarkers forvarious cancers including, without limitation, head and neck squamouscell carcinoma, lymphoma (e.g., non-Hodgkin's lymphoma) and colorectalcancer (Kumar et al., 2017, Tumour Biol, 39:1010428317740296; Kumar etal., 2017, Tumour Biol., 39:1010428317736643; Henrique et al., 2014,Expert Rev. Mol. Diagn., 14:419-22; and U.S. Pat. No. 9,816,139). DNAmethylation and changes in DNA methylation (e.g., hypomethylation,hypermethylation) also are used as biomarkers in cancer. For example,hypomethylation has been associated with hepatocellular carcinoma (see,for example, Henrique et al., 2014, Expert Rev. Mol. Diagn., 14:419-22),esophageal carcinogenesis (see, for example, Alvarez et al., 2011, PLoSGenet., 7:e1001356) and gastric and liver cancer (see, for example, U.S.Pat. No. 8,728,732), and hypermethylation has been associated withcolorectal cancer (see, for example, U.S. Pat. No. 9,957,570;). Inaddition to genome-wide changes in methylation, specific methylationchanges within particular genes can be indicative of specific cancers(see, for example, U.S. Pat. No. 8,150,626). Li et al. (2012, J.Epidemiol., 22:384-94) provides a review of the association betweennumerous cancers (e.g., breast, bladder, gastric, lung, prostate, headand neck squamous cell, and nasopharyngeal) and aberrant methylation.Additionally or alternatively, additional types of nucleic acids orfeatures of nucleic acids have been associated with various cancers.Non-limiting examples of such nucleic acids or features of nucleic acidsinclude the presence or absence of various microRNAs (miRNAs) have beenused in the diagnosis of colon, prostate, colorectal, and ovariancancers (see, for example, D'Souza et al., 2018, PLos One, 13:e0194268;Fukagawa et al., 2017, Cancer Sci., 108:886-96; Giraldez et al., 2018,Methods Mol. Biol., 1768:459-74; U.S. Pat. Nos. 8,343,718; 9,410,956;and 9,074,206). For a review on the specific association of miR-22 withcancer, see Wang et al. (2017, Int. J. Oncol., 50:345-55); the abnormalexpression of long non-coding RNAs (lncRNAs) also have been used as abiomarker in cancers such as prostate cancer, colorectal cancer,cervical cancer, melanoma, non-small cell lung cancer, gastric cancer,endometrial carcinoma, and hepatocellular carcinoma (see, for example,Wang et al., 2017, Oncotarget, 8:58577086; Wang et al., 2018, Mol.Cancer, 17:110; Yu et al., 2018, Eur. Rev. Med. Pharmacol. Sci.,22:4812-9; Yu et al., 2018, Eur. Rev. Med. Pharmacol. Sci., 22:993-1002;Zhang et al., 2018, Eur. Rev. Med. Pharmacol. Sci., 22:4820-7; Zhang etal., 2018, Eur. Rev. Med. Pharmacol. Sci., 22:2304-9; Xie et al., 2018,EBioMedicine, 33:57-67; and U.S. Pat. No. 9,410,206); the presence orabsence of circular RNA (circRNA) has been used as a biomarker in lungcancer, breast cancer, gastric cancer, colorectal cancer, and livercancer (e.g., Geng et al., 2018, J. Hematol. Oncol., 11:98) and melanoma(e.g., Zhang et al., 2018, Oncol. Lett., 16:1219-25); changes intelomeric DNA (e.g., in length or in heterozygosity) or centromeric DNA(e.g., changes in expression of centromeric genes) also have beenassociated with cancers (e.g., prostate, breast, lung, lymphoma, andEwing's sarcoma) (see, for example, Baretton et al., 1994, Cancer Res.,54:4472-80; Liscia et al., 1999, Br. J. Cancer, 80:821-6; Proctor etal., 2009, Biochim. Biophys. Acta, 1792:260-74; and Sun et al., 2016,Int. J. Cancer, 139:899-907); various mutations (e.g., deletions),rearrangements and/or copy number changes in mitochondrial DNA (mtDNA)have been used prognostically and diagnostically for various cancers(e.g., prostate cancer, melanoma, breast cancer, lung cancer, andcolorectal cancer). See, for example, Maragh et al., 2015, CancerBiomark., 15:763-73; Shen et al., 2010, Mitochondrion, 10:62-68; Hosgoodet al., 2010, Carcinogen., 31:847-9; Thyagaraj an et al., 2012, CancerEpid. Biomarkers & Prev., 21:1574-81; and U.S. Pat. No. 9,745,632; andthe abnormal presence, absence or amount of messenger RNAs (mRNAs) alsohave been correlated with various cancers including, without limitation,breast cancer, Wilms' tumors, and cervical cancer (see, for example,Guetschow et al., 2012, Anal. Bioanaly. Chem., 404:399-406;Schwienbacher et al., 2000, Cancer Res., 60:1521-5; and Ngan et al.,1997, Genitourin Med., 73:54-8). Each of these citations is incorporatedherein by reference in its entirety.

In certain aspects, provided herein are methods of detecting diseases ina subject (e.g., a human). Various methods disclosed herein provide abroadly applicable approach for non-invasive detection of cancer insubjects (e.g., a cancer such as, without limitation, endometrial orovarian cancer). Various methods disclosed herein provide a broadlyapplicable approach treatment of a subject having or suspected of havingcancer after non-invasive detection of cancer in subjects (e.g., acancer such as, without limitation, endometrial or ovarian cancer).

In some embodiments, methods provided herein include detecting geneticbiomarkers (e.g., mutations) in one or more genes from cells present ina sample (e.g., a cervical or endometrial sample) obtained from asubject. For example, methods provided herein can be used to detect thepresence of one or more genetic biomarkers (e.g., mutations) in one ormore genes selected from the group consisting of: NRAS, PTEN, FGFR2,KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7,PIK3R1, APC, EGFR, BRAF, and CDKN2A, wherein the presence of one or moregenetic biomarkers (e.g., mutations) in one or more genes is indicativeof the presence of ovarian or endometrial cancer in the subject. In someembodiments, the methods provided herein include detecting in a sampleobtained from a subject the presence of aneuploidy (e.g., monosomy ortrisomy), wherein the presence aneuploidy is indicative of the presenceof ovarian or endometrial cancer in the subject. In some embodiments,methods provided herein include detecting in a sample obtained from asubject each of the presence of one or more genetic biomarkers (e.g.,mutations) in one or more genes (e.g., NRAS, PTEN, FGFR2, KRAS, POLE,AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC,EGFR, BRAF, and/or CDKN2A) and the presence of aneuploidy (e.g.,monosomy or trisomy). In some embodiments, methods which includedetecting in a sample obtained from a subject each of the presence ofone or more genetic biomarkers (e.g., mutations) in one or more genes(e.g., NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1,CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/or CDKN2A) and thepresence of aneuploidy (e.g., monosomy or trisomy) provide a betterindication that the subject has a cancer (e.g., a endometrial or ovariancancer) than methods in which either aspect is tested individually.

In some embodiments, a sample for detecting the presence of a cancer(e.g. an ovarian or endometrial cancer) can be collected using a Papbrush. In some embodiments of any of the variety of methods providedherein, a sample for detecting the presence of a cancer (e.g. an ovarianor endometrial cancer) can be collected using a Tao brush.

In some embodiments, methods provided herein further include testing asample obtained from a subject (e.g., a plasma sample) for geneticbiomarkers in nucleic acids that are present as circulating tumor DNA(ctDNA). For example, a sample (e.g., a plasma sample) can be tested todetect genetic biomarkers in nucleic acids that harbor one or moremutations in one or more of the following genes: AKT1, APC, BRAF,CDKN2A, CTNNB1, EGFR, FBXW7, FGFR2, GNAS, HRAS, KRAS, NRAS, PIK3CA,PPP2R1A, PTEN, and/or TP53.

In various embodiments of methods provided herein in which one or moregenetic biomarkers (e.g., mutations) in genes in cells present in asample (e.g., a cervical or endometrial sample) obtained from a subjectare detected, genetic biomarkers (e.g., mutations) in one or more ofNRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1,CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/or CDKN2A can bedetected. In some embodiments, one or more genetic biomarkers (e.g.,mutations) in 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, or17 of these genes can be detected. In some embodiments, one or moregenetic biomarkers (e.g., mutations) in all 18 of these genes can bedetected. In some embodiments, the one or more genetic biomarkers (e.g.,mutations) in these genes can be a mutation shown in Table 15. In someembodiments, the one or more genetic biomarkers (e.g., mutations) inthese genes can be a mutation shown in Table 16. In some embodiments,the one or more genetic biomarkers (e.g., mutations) in these genes canbe a mutation shown in Table 17. In some embodiments, the one or moregenetic biomarkers (e.g., mutations) in these genes can be a mutation ina gene shown in Table 15. In some embodiments, the one or more geneticbiomarkers (e.g., mutations) in these genes can be a mutation in a geneshown in Table 16. In some embodiments, the one or more geneticbiomarkers (e.g., mutations) in these genes can be a mutation in a geneshown in Table 17. In some embodiments, methods provided herein todetect the presence of an ovarian or endometrial cancer by detecting thepresence of one or more genetic biomarkers (e.g., mutations) in one ormore of the following genes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53,RNF43, PPPF2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF,and/or CDKN2A can be combined with the detection of aneuploidy, thedetection of genetic biomarkers (e.g., mutations) present in ctDNA, orboth. In some embodiments, combining with the detection of aneuploidy,the detection of genetic biomarkers (e.g., mutations) present in ctDNA,or both can increase the specificity and/or sensitivity of detectingovarian or endometrial cancer. In some embodiments, the sample iscollected using a Pap brush. In some embodiments, the sample iscollected using a Tao brush.

In some embodiments, methods provided herein can be used to detect thepresence of an endometrial cancer. For example, methods provided hereincan be used to detect the presence of one or more genetic biomarkers(e.g., mutations) in one or more of the following genes: PTEN, TP53,PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC, FBXW7, RNF43, and/orPPP2R1A, wherein the presence of one or more genetic biomarkers (e.g.,mutations) in one or more genes is indicative of the presence ofendometrial cancer in the subject. In some embodiments, one or moregenetic biomarkers (e.g., mutations) in 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,or 11 of these genes can be detected. In some embodiments, one or moregenetic biomarkers (e.g., mutations) in all 12 of these genes can bedetected. In some embodiments, the one or more genetic biomarkers (e.g.,mutations) in these genes can be any mutation described herein (e.g., amutation as shown in any one of Tables 15, 16, or 17). In someembodiments, methods provided herein to detect the presence of anendometrial cancer by detecting the presence of one or more geneticbiomarkers (e.g., mutations) in one or more of the following genes:PTEN, TP53, PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC, FBXW7,RNF43, and/or PPP2R1A can be combined with the detection of aneuploidy,the detection of genetic biomarkers (e.g., mutations) present in ctDNA,or both. In some embodiments, combining with the detection ofaneuploidy, the detection of genetic biomarkers (e.g., mutations)present in ctDNA, or both can increase the specificity and/orsensitivity of detecting endometrial cancer. In some embodiments, thesample is collected using a Pap brush. In some embodiments, the sampleis collected using a Tao brush.

In some embodiments, methods provided herein can be used to detect thepresence of an ovarian cancer. For example, methods provided herein canbe used to detect the presence of one or more genetic biomarkers (e.g.,mutations) in TP53, wherein the presence of one or more geneticbiomarkers (e.g., mutations) in one or more genes is indicative of thepresence of ovarian cancer in the subject. In some embodiments, anovarian cancer detected by detecting the presence of a genetic biomarker(e.g., mutation) in TP53 can be a high-grade ovarian cancer. In someembodiments, the one or more genetic biomarkers (e.g., mutations) inTP53 can be any TP53 mutation described herein (e.g., a TP53 mutation asshown in any one of Tables 15, 16, or 17). In some embodiments, methodsprovided herein to detect the presence of an endometrial cancer bydetecting the presence of one or more genetic biomarkers (e.g.,mutations) in TP53 can be combined with the detection of aneuploidy, thedetection of genetic biomarkers (e.g., mutations) present in ctDNA, orboth. In some embodiments, combining with the detection of aneuploidy,the detection of mutations present in ctDNA, or both can increase thespecificity and/or sensitivity of detecting ovarian cancer. In someembodiments, the sample is collected using a Pap brush. In someembodiments, the sample is collected using a Tao brush.

Genetic biomarkers (e.g., mutations) in one or more of the genesdescribed herein can be detected by any of the exemplary techniques fordetecting mutations described herein. Moreover, those of ordinary skillin the art will be aware of other suitable methods for detecting geneticbiomarkers (e.g., mutations) in these genes.

In some embodiments, methods provided herein (e.g., method including thedetection of one or more genetic biomarkers (e.g., mutations) in any ofthe genes described herein, the detection of aneuploidy, or both)further include testing a sample obtained from a subject (e.g., a plasmasample) for genetic biomarkers in nucleic acids that are present ascirculating tumor DNA (ctDNA). In some embodiments, the sample includesnucleic acids that harbor one or more of genetic biomarkers (e.g.,mutations) in one or more genes (e.g., NRAS, PTEN, FGFR2, KRAS, POLE,AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC,EGFR, BRAF, and/or CDKN2A) which nucleic acids can be assayed accordingto any of the variety of methods disclosed herein. In some embodiments,the plasma sample includes nucleic acids that harbor one or more ofgenetic biomarkers (e.g., mutations) in one or more genes (e.g., AKT1,APC, BRAF, CDKN2A, CTNNB1, EGFR, FBXW7, FGFR2, GNAS, HRAS, KRAS, NRAS,PIK3CA, PPP2R1A, PTEN, and/or TP53) which nucleic acids can be assayedaccording to any of the variety of methods disclosed herein. In someembodiments, the presence of one or more genetic biomarkers (e.g.,mutations) in a gene listed in Table 15 can be detected in a sample(e.g., a plasma sample). In some embodiments, the presence of one ormore genetic biomarkers (e.g., mutations) listed in Table 15 can bedetected in a sample (e.g., a plasma sample). In some embodiments, thepresence of one or more genetic biomarkers (e.g., mutations) in a genelisted in Table 16 can be detected in a sample (e.g., a plasma sample).In some embodiments, the presence of one or more genetic biomarkers(e.g., mutations) listed in Table 16 can be detected in a sample (e.g.,a plasma sample). In some embodiments, the presence of one or moregenetic biomarkers (e.g., mutations) in a gene listed in Table 17 can bedetected in a sample (e.g., a plasma sample). In some embodiments, thepresence of one or more genetic biomarkers (e.g., mutations) listed inTable 17 can be detected in a sample (e.g., a plasma sample). As will beappreciated by those of ordinary skill in the art, such ctDNA canrepresent nucleic acids that are shed from cancer cells (e.g., cervicalcancer, endometrial cancer cells, ovarian cancer cells, and/or fallopiantubal cancer cells) and as such, can be assayed using any of the varietyof methods provided herein to determine the presence of a cancer in thesubject. In some embodiments, the sample for detecting the presence ofone or more mutations in ctDNA is, or can include, blood (e.g., wholeblood, serum, or plasma), amnion, tissue, urine, cerebrospinal fluid,saliva, sputum, broncho-alveolar lavage, bile, lymphatic fluid, cystfluid (e.g., ovarian cyst fluid), stool, ascites, pap smears, peritonealfluid, peritoneal lavage, uterine lavage, and combinations thereof.Mutations in ctDNA can be detected by any of the exemplary techniquesfor detecting mutations described herein. Moreover, those of ordinaryskill in the art will be aware of other suitable methods for detectingmutations in ctDNA.

In some embodiments, methods provided herein include detecting in asample (e.g., a cervical or endometrial sample) obtained from a subjectthe presence of one or more genetic biomarkers (e.g., mutations) in oneor more genes (e.g., NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/orCDKN2A) and/or aneuploidy (e.g., monosomy or trisomy). In someembodiments, the sample is a cervical sample. In some embodiments, thesample is an endometrial sample. In some embodiment, the samplecomprises tissue or cells from each of the cervix and the endometrium.In some embodiments, a sample is obtained with a Pap brush. In someembodiments, a sample is obtained with a Tao brush. In some embodiments,methods include isolating cells from the rest of the sample. Forexample, cells can be completely isolated from other components of thesample, or can be isolated to a degree such that the isolated cellsinclude only small amounts of other material from the sample. In someembodiments, nucleic acids present in cells isolated from a sample canbe assayed using any of the variety of methods provided herein. Forexample, nucleic acids present in cells from the sample can be isolatedand assayed.

In some embodiments, methods provided herein include detecting in asample (e.g., a cervical or endometrial sample) obtained from a subjectthe presence of one or more genetic biomarkers (e.g., mutations) in oneor more of the following genes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1,TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR,BRAF, and/or CDKN2A, wherein at least one of the genetic biomarker(e.g., at least one of the mutations) is present at a low frequency inthe sample. For example, methods provided herein can detect a geneticbiomarker (e.g., a mutation) when the genetic biomarker (e.g., themutation) is present in 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%,0.9%, 1% or fewer of the cells in the sample. In some embodiments,methods provided herein can detect a genetic biomarker (e.g., amutation) when the genetic biomarker (e.g., the mutation) is present inless than 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, or 1% ofthe total nucleic acid present in the sample.

In some embodiments of any of the variety of methods disclosed herein inwhich the presence of one or more genetic biomarkers (e.g., mutations)in one or more genes (e.g., NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53,RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF,and/or CDKN2A) and/or the presence of aneuploidy (e.g., monosomy ortrisomy), cytology may be performed in combination with or independentlyof the method. For example, cytology can be performed in combinationwith any of the variety of methods disclosed herein to improve thedetection of a cancer (e.g., an ovarian or endometrial cancer) in thesubject. In some embodiments, performing cytology in combination withdetecting the presence of one or more genetic biomarkers (e.g.,mutations) in one or more genes (e.g., NRAS, PTEN, FGFR2, KRAS, POLE,AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC,EGFR, BRAF, and/or CDKN2A) and/or the presence of aneuploidy (e.g.,monosomy or trisomy) increases the sensitivity of the assay (e.g., by atleast 10%, 20%, 30%, 40%, 50%, 60% or more). In some embodiments,performing cytology in combination with detecting the presence of one ormore genetic biomarkers (e.g., mutations) in one or more genes (e.g.,NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1,CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/or CDKN2A) and/orthe presence of aneuploidy (e.g., monosomy or trisomy) increases thespecificity of the assay (e.g., by at least 10%, 20%, 30%, 40%, 50%, 60%or more). In some embodiments, performing cytology in combination withdetecting the presence of one or more genetic biomarkers (e.g.,mutations) in one or more genes (e.g., NRAS, PTEN, FGFR2, KRAS, POLE,AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC,EGFR, BRAF, and/or CDKN2A) and/or the presence of aneuploidy (e.g.,monosomy or trisomy) permits the detection of cancers that wouldotherwise be undetectable or only rarely detectable with cytology alone(e.g., low-grade tumors). As another example, cytology can be performedindependently to confirm the presence of a cancer (e.g., an ovarian orendometrial cancer) once its presence is determined by detecting thepresence of one or more genetic biomarkers (e.g., mutations) in one ormore genes (e.g., NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/orCDKN2A) and/or aneuploidy (e.g., monosomy or trisomy). In someembodiments, methods provided herein include detecting each of thepresence of one or more genetic biomarkers (e.g., mutations) in one ormore genes (e.g., NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/orCDKN2A) and the presence of aneuploidy (e.g., monosomy or trisomy), andperforming cytology.

In some embodiments, any of the variety of methods disclosed herein canbe performed on subjects who have previously undergone treatments forcancer (e.g., an ovarian or endometrial cancer). In some embodiments,methods provided herein can be used to determine the efficacy of thetreatment. For example, a subject having an ovarian or endometrialcancer can be administered a treatment (also referred to herein as a“therapeutic intervention”), after which the continued presence ofcancer or the amount of cancer (or lack thereof) is determined bydetecting the presence of one or more genetic biomarkers (e.g.,mutations) in one or more genes (e.g., NRAS, PTEN, FGFR2, KRAS, POLE,AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC,EGFR, BRAF, and/or CDKN2A) and/or the presence of aneuploidy (e.g.,monosomy or trisomy).

In certain aspects, provided herein are methods of detecting diseases ina subject (e.g., a human). Various methods disclosed herein provide abroadly applicable approach for non-invasive detection of cancer (e.g.,an early-stage cancer such as, without limitation, bladder cancer orupper tract urothelial carcinomas (UTUC)). In some embodiments, thedisease detected is cancer. In some embodiments, the cancer detected ismalignant. In some embodiments, the disease detected is related tourinary tract. In some embodiments, the disease detected is a canceraffecting the urinary tract. In some embodiments, the disease detectedis bladder cancer. In some embodiments, the disease detected is relatedto renal pelvis. In some embodiments, the disease detected is a canceraffecting renal pelvis. In some embodiments, the disease detected is anUTUC.

In some embodiments, methods provided herein include detecting mutationsin one or more genes in a sample (e.g., a urine sample) obtained from asubject. For example, methods provided herein can be used to detect thepresence of one or more genetic biomarkers (e.g., one or more mutations)in one or more of the following genes: TP53, PIK3CA, FGFR3, KRAS, ERBB2,CDKN2A, MLL, HRAS, MET, and/or VHL. In some embodiments, methodsprovided herein include detecting the presence of at least one geneticbiomarker (e.g., mutation) in a TERT promoter in a sample obtained froma subject. In some embodiments, the methods provided herein includedetecting the presence of aneuploidy (e.g., monosomy or trisomy) in asample obtained from a subject. In some embodiments, methods providedherein include detecting in a sample obtained from a subject two or moreof: genetic biomarkers (e.g., mutations) in one or more genes (e.g.,TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL),the presence of aneuploidy (e.g., monosomy or trisomy), and the presenceof at least one genetic biomarker (e.g., mutation) in a TERT promoter.In some embodiments, methods provided herein include detecting in asample obtained from a subject each of the presence of one or moregenetic biomarkers (e.g., one or more mutations) in one or more genes(e.g., TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/orVHL), the presence of aneuploidy (e.g., monosomy or trisomy), and thepresence of at least one genetic biomarker (e.g., mutation) in a TERTpromoter. In some embodiments, methods which include detecting in asample obtained from a subject each of the presence of one or moregenetic biomarkers (e.g., mutations) in one or more genes (e.g., TP53,PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL), thepresence of aneuploidy (e.g., monosomy or trisomy), and the presence ofat least one genetic biomarker (e.g., mutation) in a TERT promoterprovide a better indication that the subject has a cancer (e.g., abladder cancer or an UTUC) than methods in which fewer than all of thesethree parameters are tested. In some embodiments, methods which includedetecting in a sample obtained from a subject each of the presence ofone or more genetic biomarkers (e.g., mutations) in one or more genes(e.g., TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/orVHL), the presence of aneuploidy (e.g., monosomy or trisomy), and thepresence of at least one genetic biomarker (e.g., mutation) in a TERTpromoter can increase the specificity and/or sensitivity of detectingovarian or endometrial cancer (e.g., a bladder cancer or an UTUC).

In various embodiments of methods provided herein in which one or moregenetic biomarkers (e.g., one or more mutations) in genes in a sample(e.g., a urine sample) obtained from a subject are detected, geneticbiomarkers (e.g., mutations) in one or more of TP53, PIK3CA, FGFR3,KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL can be detected. In someembodiments, genetic biomarkers (e.g., mutations) in 1, 2, 3, 4, 5, 6,7, 8, or 9 of these genes can be detected. In some embodiments, geneticbiomarkers (e.g., mutations) in all 10 of these genes can be detected.In some embodiments, the one or more genetic biomarkers (e.g.,mutations) in these genes can be any mutation disclosed herein. Forexample, the one or more genetic biomarkers (e.g., mutations) in thesegenes can be a mutation shown in Table 19 or Table 29. In someembodiments, at least one genetic biomarkers (e.g., mutations) in one ofTR53 or FGFR3 are detected. In some embodiments, at least one geneticbiomarker (e.g., at least one mutation) in each of TR53 or FGFR3 aredetected.

In some embodiments, methods provided herein include detecting thepresence of at least one genetic biomarker (e.g., mutation) in a TERTpromoter in a sample (e.g., a urine sample) obtained from a subject. Anygenetic biomarker (e.g., mutation) in a TERT promoter disclosed hereincan be detected. For example, any of the variety of TERT promotergenetic biomarkers (e.g., mutations) shown in Table 26 or Table 30 canbe detected using methods provided herein. In some embodiments, TERTpromoter genetic biomarkers (e.g., mutations) that can be detectedaccording to various methods provided herein include the g.1295228 C>Tand/or g.1295250 C>T mutations. In some embodiments, TERT promotergenetic biomarkers (e.g., mutations) that can be detected according tovarious methods provided herein include mutations at positions hg1295228and/or hg 1295250, which are 66 and 88 bp upstream of the transcriptionstart site, respectively. In some embodiments, a genetic biomarker(e.g., a mutation) in a TERT promoter is identified using a singleplexPCR assay. In some embodiments, a genetic biomarker (e.g., a mutation)in a TERT promoter is identified using a multiplex PCR assay. In someembodiments, single amplification primer can be used to amplify asegment containing the region of the TERT promoter known to harborgenetic biomarkers (e.g., mutations) in cancer (e.g., bladder cancer orUTUC).

As used herein, the term “TERT” refers to the gene and/or the proteinencoded by the gene, which is telomerase reverse transcriptase, acatalytic subunit of the enzyme telomerase, which, together with thetelomerase RNA component (TERC), comprises the most important unit ofthe telomerase complex. High rates of activating mutations in theupstream promoter of the TERT gene are found in the majority of BC aswell as in other cancer types. TERT promoter mutations commonly affecttwo hot spots: g.1295228 C>T and g.1295250 C>T. These mutations lead tothe generation of CCGGAA/T or GGAA/T motifs altering binding site forETS transcription factors and subsequently increased TERT promoteractivity. TERT promoter mutations occur in up to 80% of invasiveurothelial carcinomas of the bladder and upper urinary tract as well asin several of its histologic variants. Moreover, TERT promoter mutationsoccur in 60-80% of BC precursors, including Papillary UrothelialNeoplasms of Low Malignant Potential, non-invasive Low Grade PapillaryUrothelial Carcinoma, non-invasive High Grade Papillary UrothelialCarcinoma and “flat” Carcinoma in Situ (CIS), as well as in urinarycells from a subset of these patients. TERT promoter mutations have thusbeen established as a common genetic alteration in BC. Human TERTpromoter sequences are known in the art.

Genetic biomarkers (e.g., mutations) in one or more of the genesdescribed herein can be detected by any of the exemplary techniques fordetecting mutations described herein. Moreover, those of ordinary skillin the art will be aware of other suitable methods for detecting geneticbiomarkers (e.g., mutations) in these genes.

In some embodiments, methods provided herein include detecting in asample (e.g., a urine sample) obtained from a subject the presence ofone or more genetic biomarkers (e.g., mutations) in one or more of thefollowing genes: TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS,MET, and/or VHL, the presence of at least one genetic biomarker (e.g.,at least one mutation) in a TERT promoter, or both. In some embodiments,the sample is a urine sample. In some embodiments provided herein,methods include isolating such cells from the rest of the sample. Forexample, cells can be completely isolated from other components of thesample, or can be isolated to a degree such that the isolated cellsinclude only small amounts of other material(s) from the sample. In someembodiments, the presence of genetic biomarkers in nucleic acids presentin cells isolated from a sample and/or the presence of aneuploidy incells isolated from a sample can be assayed using any of the variety ofmethods provided herein. For example, nucleic acids present in cellsfrom the sample can be isolated and assayed for the presence of one ormore genetic biomarkers and/or the presence of aneuploidy. In someembodiments, cells are not isolated from the sample prior to isolatingtheir nucleic acids for analysis. In some embodiments, the sampleincludes nucleic acids that harbor one or more genetic biomarkers (e.g.,mutations) in one or more genes (e.g., TP53, PIK3CA, FGFR3, KRAS, ERBB2,CDKN2A, MLL, HRAS, MET, and/or VHL), at least one genetic biomarker(e.g., a mutation) in a TERT promoter, and/or aneuploidy (e.g., monosomyor trisomy), which nucleic acids are assayed according to any of thevariety of methods disclosed herein. As will be appreciated by those ofordinary skill in the art, such nucleic acids can represent nucleicacids that are shed from cancer cells (e.g., bladder cancer cells orcells from UTUCs) and as such, can be assayed using any of the varietyof methods provided herein to determine the presence of a cancer in thesubject.

In some embodiments, methods provided herein include detecting in asample (e.g., a urine sample) obtained from a subject the presence ofone or more genetic biomarkers (e.g., mutations) in one or more of thefollowing genes: TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS,MET, and/or VHL, the presence of at least one genetic biomarker (e.g., amutation) in a TERT promoter, or both, wherein at least one of thegenetic biomarkers (e.g., at least one of the mutations) is present at alow frequency in the sample. For example, methods provided herein candetect a genetic biomarker (e.g., a mutation) when the genetic biomarker(e.g., the mutation) is present in 0.01%, 0.02%, 0.03%, 0.04%, 0.05%,0.06%, 0.07%, 0.08%, 0.09%, 0.1%, 0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%,0.8%, 0.9%, 1% or fewer of the cells in the sample. In some embodiments,methods provided herein can detect a genetic biomarker (e.g., amutation) when the genetic biomarker (e.g., the mutation) is present in0.03% or fewer of the cells in the sample. In some embodiments, methodsprovided herein can detect a genetic biomarker (e.g., a mutation) whenthe genetic biomarker (e.g., the mutation) is present in less than0.01%, 0.02%, 0.03%, 0.04%, 0.05%, 0.06%, 0.07%, 0.08%, 0.09%, 0.1%,0.2%, 0.3%, 0.4%, 0.5%, 0.6%, 0.7%, 0.8%, 0.9%, or 1% of the totalnucleic acid present in the sample.

In some embodiments of any of the variety of methods disclosed herein inwhich the presence of one or more genetic biomarkers (e.g., mutations)in one or more genes (e.g., TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A,MLL, HRAS, MET, and/or VHL), the presence of aneuploidy (e.g., monosomyor trisomy), and/or the presence of at least one genetic biomarker(e.g., a mutation) in a TERT promoter is detected, cytology may beperformed in combination with or independently of the method. Forexample, cytology can be performed in combination with any of thevariety of methods disclosed herein to improve the detection of a cancer(e.g., a bladder cancer or an UTUC) in the subject. In some embodiments,performing cytology in combination with detecting the presence of one ormore genetic biomarkers (e.g., mutations) in one or more genes (e.g.,TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL),the presence of aneuploidy (e.g., monosomy or trisomy), and/or thepresence of at least one genetic biomarker (e.g., a mutation) in a TERTpromoter increases the sensitivity of the assay as compared to cytologyalone (e.g., by at least 10%, 20%, 30%, 40%, 50%, 60% or more). In someembodiments, performing cytology in combination with detecting thepresence of one or more genetic biomarkers (e.g., mutations) in one ormore genes (e.g., TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS,MET, and/or VHL), the presence of aneuploidy (e.g., monosomy ortrisomy), and/or the presence of at least one mutation in a TERTpromoter (e.g., a genetic biomarker in a TERT promoter) increases thespecificity of the assay as compared to cytology alone (e.g., by atleast 10%, 20%, 30%, 40%, 50%, 60% or more). In some embodiments,performing cytology in combination with detecting the presence of one ormore genetic biomarkers (e.g., mutations) in one or more genes (e.g.,TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL),the presence of aneuploidy (e.g., monosomy or trisomy), and/or thepresence of at least one genetic biomarker (e.g., a mutation) in a TERTpromoter permits the detection of cancers that would otherwise beundetectable or only rarely detectable with cytology alone (e.g.,low-grade tumors). As another example, cytology can be performedindependently to confirm the presence of a cancer (e.g., a bladdercancer or an UTUC) once its presence is determined by detecting thepresence of one or more genetic biomarkers (e.g., mutations) in one ormore genes (e.g., TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS,MET, and/or VHL), aneuploidy (e.g., monosomy or trisomy), and/or thepresence of at least one genetic biomarker (e.g., a mutation) in a TERTpromoter. In some embodiments, methods provided herein include detectingeach of the presence of one or more genetic biomarkers (e.g., mutations)in one or more genes (e.g., TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A,MLL, HRAS, MET, and/or VHL), the presence of aneuploidy (e.g., monosomyor trisomy), the presence of at least one genetic biomarker (e.g., amutation) in a TERT promoter, and performing cytology.

In some embodiments, any of the variety of methods disclosed herein canbe performed on subjects who have previously undergone treatments forcancer (e.g., bladder cancer or UTUC). In some embodiments, methodsprovided herein can be used to determine the efficacy of the treatment.For example, a subject having bladder cancer or UTUC can be administereda treatment (also referred to herein as a “therapeutic intervention”),after which the continued presence of cancer or the amount of cancer (orlack thereof) is determined by detecting the presence of one or moregenetic biomarkers (e.g., mutations) in one or more genes (e.g., TP53,PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL), thepresence of aneuploidy (e.g., monosomy or trisomy), and/or the presenceof at least one genetic biomarker (e.g., a mutation) in a TERT promoter.

Some embodiments of methods provided herein include testing cytologicalspecimens for cancer. In some embodiments, one or more cytological testsare used for diagnosis or screening. In some embodiments, the one ormore cytological tests are used for diagnosing cancer. In someembodiments, the one or more cytological tests are used for screeningcancer. In some embodiments, one or more cytological tests are used forclassifying a disease or condition. In some embodiments, one or morecytological tests are used for classifying a cancer.

Various methods may be used to collect a sample including, but notlimited to, aspiration cytology (e.g. fine needle aspiration),exfolative cytology (e.g. impression smears and tissue scrapings),cystoscopy.

In some embodiments, the cytological test includes a gross examination.In some embodiments, the cytological test includes a histologicalexamination. In some embodiments, the cytological test includes a frozensection exam. In some embodiments, the cytological test is administeredin conjunction with another method or test. In some embodiments, theother method or test includes a histochemical stain. In someembodiments, the other method or test includes an immunohistochemicalstain. In some embodiments, the other method or test includes electronmicroscopy. In some embodiments, the other method or test includes flowcytometry. In some embodiments, the other method or test includes imagecytometry. In some embodiments, the other method or test includesgenetic tests. For example, the genetic test may include, but is notlimited to, a cytogenetic test, a fluorescent in situ hybridization(FISH) test, and/or a molecular genetic test.

In some embodiments, a molecular genetic test is used on a cytologicalsample (e.g., a sample on which cytology is also performed) to detectthe presence of one or more genetic biomarkers (e.g., mutations) inPTEN, TP53, PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC, FBXW7,RNF43, and/or PPP2R1A. In some embodiments, a molecular genetic test isused on a cytological sample (e.g., a sample on which cytology is alsoperformed) to detect the presence of one or more genetic biomarkers(e.g., mutations) in TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL,HRAS, MET, and/or VHL. In some embodiments, a molecular genetic test isused on a cytological sample (e.g., a sample on which cytology is alsoperformed) to detect the presence of one or more genetic biomarkers(e.g., mutations) in NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, orCDKN2A. In some embodiments, a molecular genetic test is used on acytological sample (e.g., a sample on which cytology is also performed)to detect the presence of one or more genetic biomarkers (e.g.,mutations) in TP53.

Protein Biomarkers in Combination with Aneuploidy

In one aspect, provided herein are methods and materials for detectingthe presence of one or more members of a panel of protein biomarkers andthe presence of aneuploidy in one or more samples obtained from asubject. In another aspect, provided herein are methods and materialsfor diagnosing or identifying the presence of a disease in a subject(e.g., identifying the subject as having cancer) by detecting thepresence of one or more members of a panel of protein biomarkers and thepresence of aneuploidy in one or more samples obtained from the subject.In another aspect, provided herein are methods and materials foridentifying a subject as being at risk (e.g., increased risk) of havingor developing a disease (e.g., cancer) by detecting the presence of oneor more members of a panel of protein biomarkers and the presence ofaneuploidy in one or more samples obtained from the subject. In anotheraspect, provided herein are methods and materials for treating a subjectwho has been diagnosed or identified as having a disease (e.g., cancer)or who has been identified as being at risk (e.g., increased risk) ofhaving or developing a disease (e.g., cancer) by detecting the presenceof one or more members of a panel of protein biomarkers and the presenceof aneuploidy in one or more samples obtained from the subject. Inanother aspect, provided herein are methods and materials foridentifying a treatment for a subject who has been diagnosed oridentified as having a disease (e.g., cancer) or who has been identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) by detecting the presence of one or more membersof a panel of protein biomarkers and the presence of aneuploidy in oneor more samples obtained from the subject. In another aspect, providedherein are methods and materials for identifying a subject who will oris likely to respond to a treatment by detecting the presence of one ormore members of a panel of protein biomarkers and the presence ofaneuploidy in one or more samples obtained from the subject. In anotheraspect, provided herein are methods and materials for identifying asubject as a candidate for further diagnostic testing by detecting thepresence of one or more members of a panel of protein biomarkers and thepresence of aneuploidy in one or more samples obtained from the subject.In another aspect, provided herein are methods and materials foridentifying a subject as a candidate for increased monitoring bydetecting the presence of one or more members of a panel of proteinbiomarkers and the presence of aneuploidy in one or more samplesobtained from the subject.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of protein biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectprovide high sensitivity in the detection or diagnosis of cancer (e.g.,a high frequency or incidence of correctly identifying a subject ashaving cancer). In some embodiments, methods provided herein thatinclude detecting the presence of one or more members of a panel ofprotein biomarkers and the presence of aneuploidy in one or more samplesobtained from a subject provide a sensitivity in the detection ordiagnosis of cancer (e.g., a high frequency or incidence of correctlyidentifying a subject as having cancer) that is higher than thesensitivity provided by separately detecting the presence of one or moremembers of a panel of protein biomarkers or the presence of aneuploidy.In some embodiments, methods and materials provided herein that includedetecting the presence of one or more members of a panel of proteinbiomarkers and the presence of aneuploidy in one or more samplesobtained from a subject provide a sensitivity of at least about 70%, atleast about 75%, at least about 80%, at least about 85%, at least about90%, at least about 91%, at least about 92%, at least about 93%, atleast about 94%, at least about 95%, at least about 96%, at least about97%, at least about 98%, at least about 99%, or higher. In someembodiments, methods and materials provided herein that includedetecting the presence of one or more members of a panel of proteinbiomarkers and the presence of aneuploidy in one or more samplesobtained from a subject provide high sensitivity in detecting a singletype of cancer. In some embodiments, methods and materials providedherein that include detecting the presence of one or more members of apanel of protein biomarkers and the presence of aneuploidy in one ormore samples obtained from a subject provide high sensitivity indetecting two or more types of cancers. Any of a variety of cancer typescan be detected using methods and materials provided herein (see, e.g.,the section entitled “Cancers”). In some embodiments, cancers that canbe detected using methods and materials that include detecting thepresence of one or more members of a panel of protein biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectinclude pancreatic cancer. In some embodiments, cancers that can bedetected using methods and materials that include detecting the presenceof one or more members of a panel of protein biomarkers and the presenceof aneuploidy in one or more samples obtained from a subject includeliver cancer, ovarian cancer, esophageal cancer, stomach cancer,pancreatic cancer, colorectal cancer, lung cancer, or breast cancer. Insome embodiments, cancers that can be detected using methods andmaterials that include detecting the presence of one or more members ofa panel of protein biomarkers and the presence of aneuploidy in one ormore samples obtained from a subject include cancers of the femalereproductive tract (e.g., cervical cancer, endometrial cancer, ovariancancer, or fallopian tubal cancer). In some embodiments, cancers thatcan be detected using methods and materials that include detecting thepresence of one or more members of a panel of protein biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectinclude bladder cancer or upper-tract urothelial carcinomas.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of protein biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectprovide high specificity in the detection or diagnosis of cancer (e.g.,a low frequency or incidence of incorrectly identifying a subject ashaving cancer when that subject does not have cancer). In someembodiments, methods provided herein that include detecting the presenceof one or more members of a panel of protein biomarkers and the presenceof aneuploidy in one or more samples obtained from a subject provide aspecificity in the detection or diagnosis of cancer (e.g., a highfrequency or incidence of correctly identifying a subject as havingcancer) that is higher than the specificity provided by separatelydetecting the presence of one or more members of a panel of proteinbiomarkers or the presence of aneuploidy. In some embodiments, methodsand materials provided herein that include that include detecting thepresence of one or more members of a panel of protein biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectprovide a specificity of at least about 70%, at least about 75%, atleast about 80%, at least about 85%, at least about 90%, at least about91%, at least about 92%, at least about 93%, at least about 94%, atleast about 95%, at least about 96%, at least about 97%, at least about98%, at least about 99%, or higher. As will be understood by those ofordinary skill in the art, a specificity of 99% means that only 1% ofsubjects that do not have cancer are incorrectly identified as havingcancer. In some embodiments, methods and materials provided herein thatinclude detecting the presence of one or more members of a panel ofprotein biomarkers and the presence of aneuploidy in one or more samplesobtained from a subject provide high specificity in detecting a singlecancer (e.g., there is a low probability of incorrectly identifying thatsubject as having that single cancer type). In some embodiments, methodsand materials provided herein that include detecting the presence of oneor more members of a panel of protein biomarkers and the presence ofaneuploidy in one or more samples obtained from a subject provide highspecificity in detecting two or more cancers (e.g., there is a lowprobability of incorrectly identifying that subject as having those twoor more cancer types).

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of protein biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectinclude detecting the presence of: 1) one or more (e.g., 1, 2, 3, 4, 5,6, 7, or 8) of the following protein biomarkers: CA19-9, CEA, HGF, OPN,CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO), and 2) thepresence of aneuploidy. In some embodiments, methods provided hereinthat include detecting the presence of one or more members of a panel ofprotein biomarkers and the presence of aneuploidy in one or more samplesobtained from a subject include detecting the presence of: 1) each ofthe following protein biomarkers: CA19-9, CEA, HGF, OPN, CA125,prolactin, TIMP-1, and myeloperoxidase (MPO), and the presence ofaneuploidy. In some embodiments of methods provided herein that includedetecting the presence of one or more members of a panel of proteinbiomarkers and the presence of aneuploidy in one or more samplesobtained from a subject include detecting the presence of: 1) one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, or 8) of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, and/ormyeloperoxidase (MPO), and 2) the presence of aneuploidy, the subject isdetermined as having (e.g., diagnosed to have) or is determined to be(e.g. diagnosed as being) at elevated risk of having or developing oneof the following types of cancer: liver cancer, ovarian cancer,esophageal cancer, stomach cancer, pancreatic cancer, colorectal cancer,lung cancer, and/or breast cancer.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of protein biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectinclude detecting the presence of: 1) one or more (e.g., 1, 2, 3, 4, 5,6, 7, 8, 9, 10, or 11) of the following protein biomarkers: CA19-9, CEA,HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, and/orCA15-3, and 2) the presence of aneuploidy. In some embodiments, methodsprovided herein that include detecting the presence of one or moremembers of a panel of protein biomarkers and the presence of aneuploidyin one or more samples obtained from a subject include detecting thepresence of: 1) each of the following protein biomarkers: CA19-9, CEA,HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, and CA15-3,and 2) the presence of aneuploidy. In some embodiments of methodsprovided herein that include detecting the presence of one or moremembers of a panel of protein biomarkers and the presence of aneuploidyin one or more samples obtained from a subject include detecting thepresence of: 1) one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11)of the following protein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP,prolactin, TIMP-1, follistatin, G-CSF, and/or CA15-3, and 2) thepresence of aneuploidy, the subject is determined as having (e.g.,diagnosed to have) or is determined to be (e.g. diagnosed as being) atelevated risk of having or developing cancer one of the following typesof cancer: liver cancer, ovarian cancer, esophageal cancer, stomachcancer, pancreatic cancer, colorectal cancer, lung cancer, and/or breastcancer.

In some embodiments, methods provided herein that include the presenceof one or more members of a panel of protein biomarkers and the presenceof aneuploidy in one or more samples obtained from a subject includedetecting the presence of: 1) one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8,or 9) of the following protein biomarkers: CA19-9, CEA, HGF, OPN, CA125,AFP, prolactin, TIMP-1, and/or CA15-3, and 2) the presence ofaneuploidy. In some embodiments, methods provided herein that includedetecting the presence of one or more members of a panel of geneticbiomarkers and the presence of one or more members of a panel of proteinbiomarkers in one or more samples obtained from a subject includedetecting the presence of: 1) each of the following protein biomarkers:CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and CA15-3, and 2)the presence of aneuploidy. In some embodiments of methods providedherein that include detecting the presence of one or more members of apanel of protein biomarkers and the presence of aneuploidy in one ormore samples obtained from a subject include detecting the presenceof: 1) one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or 9) of the followingprotein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin,TIMP-1, and/or CA15-3, and 2) the presence of aneuploidy, the subject isdetermined as having (e.g., diagnosed to have) or is determined to be(e.g. diagnosed as being) at elevated risk of having or developing oneof the following types of cancer: liver cancer, ovarian cancer,esophageal cancer, stomach cancer, pancreatic cancer, colorectal cancer,lung cancer, and/or breast cancer.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of protein biomarkers and thepresence of aneuploidy in one or more samples obtained from a subjectinclude detecting the presence of: 1) one or more (e.g., 1, 2, 3, or 4)of the following protein biomarkers: CA19-9, CEA, HGF, and/or OPN, and2) the presence of aneuploidy. In some embodiments, methods providedherein that include detecting the presence of one or more members of apanel of protein biomarkers and the presence of aneuploidy in one ormore samples obtained from a subject include detecting the presenceof: 1) each of the following protein biomarkers: CA19-9, CEA, HGF, andOPN, and 2) the presence of aneuploidy. In some embodiments of methodsprovided herein that include detecting the presence of one or moremembers of a panel of protein biomarkers and the presence of aneuploidyin one or more samples obtained from a subject include detecting thepresence of 1) one or more (e.g., 1, 2, 3, or 4) of the followingprotein biomarkers: CA19-9, CEA, HGF, and/or OPN, and 2) the presence ofaneuploidy, a subject is determined as having (e.g., diagnosed to have)or is determined to be (e.g. diagnosed as being) at elevated risk ofhaving or developing pancreatic cancer.

A sample obtained from a subject can be any of the variety of samplesdescribed herein that contains DNA (e.g., ctDNA in the blood, or DNApresent in bladder, cervical, endometrial, or uterine samples) and/orproteins. In some embodiments, DNA (e.g., cell-free DNA (e.g., ctDNA) orDNA present in bladder, cervical, endometrial, or uterine samples)and/or proteins in a sample obtained from the subject are derived from atumor cell. In some embodiments, DNA (e.g., cell-free DNA (e.g., ctDNA)in a sample obtained from the subject includes one or more geneticbiomarkers and or aneuploid DNA. In some embodiments, proteins in asample obtained from the subject includes one or more proteinbiomarkers. Non-limiting examples of samples in which genetic biomarkersand/or protein biomarkers and/or aneuploidy can be detected include ablood sample, a plasma sample, a serum sample, a urine sample, anendometrial sample, a cervical sample, and a uterine sample. In someembodiments, the presence of one or more protein biomarkers and thepresence aneuploidy is detected in a single sample obtained from thesubject. In some embodiments, the presence of one or more proteinbiomarkers is detected in a first sample obtained from a subject, andthe presence of aneuploidy is detected in a second sample obtained fromthe subject.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of protein biomarkers (e.g.,each member of a panel of protein biomarkers) and the presence ofaneuploidy in one or more samples obtained from a subject, an elevatedlevel of one or more members of the panel of protein biomarkers can bedetected. For example, an elevated level of a protein biomarker can be alevel that is higher that a reference level. A reference level can beany level of the protein biomarker that is not associated with thepresence of cancer. For example, a reference level of a proteinbiomarker can be a level that is present in a reference subject thatdoes not have cancer or does not harbor a cancer cell. A reference levelof a protein biomarker can be the average level that is present in aplurality of reference subjects that do not have cancer or do not harbora cancer cell. A reference level of a protein biomarker in a subjectdetermined to have cancer can be the level that was presence in thesubject prior to the onset of cancer. In some embodiments, a panel ofprotein biomarkers in which one or more members of the panel is presentat an elevated level includes one or more of (e.g., 1, 2, 3, 4, 5, 6, 7,or each of): CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, and/ormyeloperoxidase (MPO). In some embodiments, a panel of proteinbiomarkers in which one or more members of the panel is present at anelevated level includes one or more of (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9,10, or each of): CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1,follistatin, G-CSF, and/or CA15-3. In some embodiments, a panel ofprotein biomarkers in which one or more members of the panel is presentat an elevated level includes one or more of (e.g., 1, 2, 3, 4, 5, 6, 7,8, or each of): CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1,and/or CA15-3. In some embodiments, a panel of protein biomarkers inwhich one or more members of the panel is present at an elevated levelincludes one or more of (e.g., 1, 2, 3, or each of): CA19-9, CEA, HGF,and/or OPN.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of protein biomarkers (e.g.,each member of a panel of protein biomarkers) and the presence ofaneuploidy in one or more samples obtained from a subject, a decreasedlevel of one or more members of the panel of protein biomarkers can bedetected. For example, a decreased level of a protein biomarker can be alevel that is lower that a reference level. A reference level can be anylevel of the protein biomarker that is not associated with the presenceof cancer. For example, a reference level of a protein biomarker can bea level that is present in a reference subject that does not have canceror does not harbor a cancer cell. A reference level of a proteinbiomarker can be the average level that is present in a plurality ofreference subjects that do not have cancer or do not harbor a cancercell. A reference level of a protein biomarker in a subject determinedto have cancer can be the level that was presence in the subject priorto the onset of cancer. In some embodiments, a panel of proteinbiomarkers in which one or more members of the panel is present at adecreased level includes one or more of (e.g., 1, 2, 3, 4, 5, 6, 7, oreach of): CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, and/ormyeloperoxidase (MPO). In some embodiments, a panel of proteinbiomarkers in which one or more members of the panel is present at adecreased level includes one or more of (e.g., 1, 2, 3, 4, 5, 6, 7, 8,9, 10, or each of): CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin,TIMP-1, follistatin, G-CSF, and/or CA15-3. In some embodiments, a panelof protein biomarkers in which one or more members of the panel ispresent at a decreased level includes one or more of (e.g., 1, 2, 3, 4,5, 6, 7, 8, or each of): CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin,TIMP-1, and/or CA15-3. In some embodiments, a panel of proteinbiomarkers in which one or more members of the panel is present at adecreased level includes one or more of (e.g., 1, 2, 3, or each of):CA19-9, CEA, HGF, and/or OPN.

In some embodiments, when a subject is determined as having (e.g.,diagnosed to have) cancer or determined to be (e.g. diagnosed as being)at elevated risk of having or developing cancer (e.g., by detecting: 1)the presence of aneuploidy, and 2) the presence of one or more proteinbiomarkers in any of the panels described herein as being useful inconjunction with the presence of aneuploidy), the subject is selected asa candidate for (e.g., is selected for) further diagnostic testing(e.g., any of the variety of further diagnostic testing methodsdescribed herein), the subject is selected as a candidate for (e.g. isselected for) increased monitoring (e.g., any of the variety ofincreasing monitoring methods described herein), the subject isidentified as a subject who will or is likely to respond to a treatment(e.g., any of the variety of therapeutic interventions describedherein), the subject is selected as a candidate for (e.g., is selectedfor) a treatment, a treatment (e.g., any of the variety of therapeuticinterventions described herein) is selected for the subject, and/or atreatment (e.g., any of the variety of therapeutic interventionsdescribed herein) is administered to the subject. For example, when asubject is determined as having (e.g., diagnosed to have) cancer ordetermined to be (e.g. diagnosed as being) at elevated risk of having ordeveloping cancer, the subject can undergo further diagnostic testing,which further diagnostic testing can confirm the presence of cancer inthe subject. Additionally or alternatively, the subject can be monitoredat in increased frequency. In some embodiments of a subject determinedas having (e.g., diagnosed to have) cancer or determined to be (e.g.diagnosed as being) at elevated risk of having or developing cancer inwhich the subject undergoes further diagnostic testing and/or increasedmonitoring, the subject can additionally be administered a therapeuticintervention. In some embodiments, after a subject is administered atherapeutic intervention, the subject undergoes additional furtherdiagnostic testing (e.g., the same type of further diagnostic testing aswas performed previously and/or a different type of further diagnostictesting) and/or continued increased monitoring (e.g., increasedmonitoring at the same or at a different frequency as was previouslydone). In embodiments, after a subject is administered a therapeuticintervention and the subject undergoes additional further diagnostictesting and/or additional increased monitoring, the subject isadministered another therapeutic intervention (e.g., the sametherapeutic intervention as was previously administered and/or adifferent therapeutic intervention). In some embodiments, after asubject is administered a therapeutic intervention, the subject istested for 1) the presence of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, or8) of the following protein biomarkers: CA19-9, CEA, HGF, OPN, CA125,prolactin, TIMP-1, and/or myeloperoxidase (MPO), and 2) the presence ofaneuploidy. In some embodiments, after a subject is administered atherapeutic intervention, the subject is tested for 1) the presence ofone or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11) of thefollowing protein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP,prolactin, TIMP-1, follistatin, G-CSF, and/or CA15-3, and 2) thepresence of aneuploidy. In some embodiments, after a subject isadministered a therapeutic intervention, the subject is tested for 1)the presence of one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or 9) of thefollowing protein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP,prolactin, TIMP-1, and/or CA15-3, and 2) the presence of aneuploidy. Insome embodiments, after a subject is administered a therapeuticintervention, the subject is tested for 1) the presence of one or more(e.g., 1, 2, 3, or 4) of the following protein biomarkers: CA19-9, CEA,HGF, and/or OPN, and 2) the presence of aneuploidy.

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of protein biomarkers andthe presence of aneuploidy, the methods further include detecting thepresence one or more members of a panel of genetic biomarkers in one ormore samples obtained from a subject (e.g., the same sample use todetect either or both of the presence of one or more members of a panelof protein biomarkers and the presence of aneuploidy, or a differentsample). Any of a variety of genetic biomarkers can be detected (e.g.,any of the variety of genetic biomarkers and/or genetic biomarker panelsdescribed herein).

In some embodiments, methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, or 8) of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, and/ormyeloperoxidase (MPO), and 2) the presence of aneuploidy, the methodsfurther include detecting the presence one or more members of a panel ofgenetic biomarkers in one or more samples obtained from a subject (e.g.,the same sample use to detect either or both of the presence of one ormore members of a panel of protein biomarkers and the presence ofaneuploidy, or a different sample). In some embodiments, methodsprovided herein that include detecting in one or more samples obtainedfrom a subject the presence of: 1) each of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, andmyeloperoxidase (MPO), and the presence of aneuploidy, the methodsfurther include detecting the presence one or more members of a panel ofgenetic biomarkers in one or more samples obtained from a subject (e.g.,the same sample use to detect either or both of the presence of one ormore members of a panel of protein biomarkers and the presence ofaneuploidy, or a different sample). In some embodiments of methodsprovided herein that include detecting in one or more samples obtainedfrom a subject the presence of: 1) one or more (e.g., 1, 2, 3, 4, 5, 6,7, or 8) of the following protein biomarkers: CA19-9, CEA, HGF, OPN,CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO), 2) the presenceof aneuploidy, and 3) the presence of one or more members of a panel ofgenetic biomarkers, the subject is determined as having (e.g., diagnosedto have) or is determined to be (e.g. diagnosed as being) at elevatedrisk of having or developing one of the following types of cancer: livercancer, ovarian cancer, esophageal cancer, stomach cancer, pancreaticcancer, colorectal cancer, lung cancer, and/or breast cancer.

In some embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11) of the followingprotein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin,TIMP-1, follistatin, G-CSF, and/or CA15-3, and 2) the presence ofaneuploidy, the methods further include detecting the presence one ormore members of a panel of genetic biomarkers in one or more samplesobtained from a subject (e.g., the same sample use to detect either orboth of the presence of one or more members of a panel of proteinbiomarkers and the presence of aneuploidy, or a different sample). Insome embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) each ofthe following protein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP,prolactin, TIMP-1, follistatin, G-CSF, and CA15-3, and 2) the presenceof aneuploidy, the methods further include detecting the presence one ormore members of a panel of genetic biomarkers in one or more samplesobtained from a subject (e.g., the same sample use to detect either orboth of the presence of one or more members of a panel of proteinbiomarkers and the presence of aneuploidy, or a different sample). Insome embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11) of the followingprotein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin,TIMP-1, follistatin, G-CSF, and/or CA15-3, 2) the presence ofaneuploidy, and 3) the presence of one or more members of a panel ofgenetic biomarkers, the subject is determined as having (e.g., diagnosedto have) or is determined to be (e.g. diagnosed as being) at elevatedrisk of having or developing cancer one of the following types ofcancer: liver cancer, ovarian cancer, esophageal cancer, stomach cancer,pancreatic cancer, colorectal cancer, lung cancer, and/or breast cancer.

In some embodiments of methods provided herein that detecting in one ormore samples obtained from a subject the presence of: 1) one or more(e.g., 1, 2, 3, 4, 5, 6, 7, 8, or 9) of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and/orCA15-3, and 2) the presence of aneuploidy, the methods further includedetecting the presence one or more members of a panel of geneticbiomarkers in one or more samples obtained from a subject (e.g., thesame sample use to detect either or both of the presence of one or moremembers of a panel of protein biomarkers and the presence of aneuploidy,or a different sample). In some embodiments or methods provided hereinthat include detecting in one or more samples obtained from a subjectthe presence of: 1) each of the following protein biomarkers: CA19-9,CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and CA15-3, and 2) thepresence of aneuploidy, the methods further include detecting thepresence one or more members of a panel of genetic biomarkers in one ormore samples obtained from a subject (e.g., the same sample use todetect either or both of the presence of one or more members of a panelof protein biomarkers and the presence of aneuploidy, or a differentsample). In some embodiments of methods provided herein that include inone or more samples obtained from a subject the presence of: 1) one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or 9) of the following proteinbiomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and/orCA15-3, 2) the presence of aneuploidy, and 3) the presence of one ormore members of a panel of genetic biomarkers, the subject is determinedas having (e.g., diagnosed to have) or is determined to be (e.g.diagnosed as being) at elevated risk of having or developing one of thefollowing types of cancer: liver cancer, ovarian cancer, esophagealcancer, stomach cancer, pancreatic cancer, colorectal cancer, lungcancer, and/or breast cancer.

In some embodiments of methods provided herein that include detecting inone or more samples obtained from a subject the presence of: 1) one ormore (e.g., 1, 2, 3, or 4) of the following protein biomarkers: CA19-9,CEA, HGF, and/or OPN, and 2) the presence of aneuploidy, the methodsfurther include detecting the presence one or more members of a panel ofgenetic biomarkers in one or more samples obtained from a subject (e.g.,the same sample use to detect either or both of the presence of one ormore members of a panel of protein biomarkers and the presence ofaneuploidy, or a different sample). In some embodiments of methodsprovided herein that include detecting in one or more samples obtainedfrom a subject the presence of: 1) each of the following proteinbiomarkers: CA19-9, CEA, HGF, and OPN, and 2) the presence ofaneuploidy. In some embodiments of methods provided herein that includedetecting in one or more samples obtained from a subject the presenceof 1) one or more (e.g., 1, 2, 3, or 4) of the following proteinbiomarkers: CA19-9, CEA, HGF, and/or OPN, 2) the presence of aneuploidy,and 3) the presence of one or more members of a panel of geneticbiomarkers, a subject is determined as having (e.g., diagnosed to have)or is determined to be (e.g. diagnosed as being) at elevated risk ofhaving or developing pancreatic cancer.

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of protein biomarkers andthe presence of aneuploidy, one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) ofthe following genes can further be detected: NRAS, CTNNB1, PIK3CA,FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53,PPP2R1A, and/or GNAS. In some embodiments of methods provided hereinthat include detecting the presence of one or more members of a panel ofprotein biomarkers and the presence of aneuploidy, one or more geneticbiomarkers in each of the following genes can further be detected: NRAS,CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS,AKT1, TP53, PPP2R1A, and GNAS.

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of protein biomarkers andthe presence of aneuploidy, one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, or 4) of the following genes can further bedetected: KRAS (e.g., genetic biomarkers in codons 12 and/or 61), TP53,CDKN2A, and/or SMAD4. In some embodiments of methods provided hereinthat include detecting the presence of one or more members of a panel ofprotein biomarkers and the presence of aneuploidy, one or more geneticbiomarkers in each of the following genes can further be detected: KRAS(e.g., genetic biomarkers in codons 12 and/or 61), TP53, CDKN2A, andSMAD4.

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of protein biomarkers andthe presence of aneuploidy, one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) of the following genes canfurther be detected: TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL,HRAS, MET, and/or VHL. In some embodiments of methods provided hereinthat include detecting the presence of one or more members of a panel ofprotein biomarkers and the presence of aneuploidy, one or more geneticbiomarkers in each of the following genes can further be detected: TP53,PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and VHL.

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of protein biomarkers andthe presence of aneuploidy, one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,or 18) of the following genes can further be detected: NRAS, PTEN,FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA,FBXW7, PIK3R1, APC, EGFR, BRAF, and/or CDKN2A. In some embodiments ofmethods provided herein that include detecting the presence of one ormore members of a panel of protein biomarkers and the presence ofaneuploidy, one or more genetic biomarkers in each of the followinggenes can further be detected: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1,TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR,BRAF, and CDKN2A.

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of protein biomarkers andthe presence of aneuploidy, one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12) of the followinggenes can further be detected: PTEN, TP53, PIK3CA, PIK3R1, CTNNB1, KRAS,FGFR2, POLE, APC, FBXW7, RNF43, and/or PPP2R1A. In some embodiments ofmethods provided herein that include detecting the presence of one ormore members of a panel of protein biomarkers and the presence ofaneuploidy, one or more genetic biomarkers in each of the followinggenes can further be detected: PTEN, TP53, PIK3CA, PIK3R1, CTNNB1, KRAS,FGFR2, POLE, APC, FBXW7, RNF43, and PPP2R1A.

In some embodiments of methods provided herein that include detectingthe presence of one or more members of a panel of protein biomarkers andthe presence of aneuploidy, one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) ofthe following genes can further be detected: AKT1, APC, BRAF, CDKN2A,CTNNB1, EGFR, FBXW7, FGFR2, GNAS, HRAS, KRAS, NRAS, PIK3CA, PPP2R1A,PTEN, and/or TP53. In some embodiments of methods provided herein thatinclude detecting the presence of one or more members of a panel ofgenetic biomarkers and the presence of aneuploidy, one or more proteinbiomarkers in each of the following genes can further be detected: AKT1,APC, BRAF, CDKN2A, CTNNB1, EGFR, FBXW7, FGFR2, GNAS, HRAS, KRAS, NRAS,PIK3CA, PPP2R1A, PTEN, and TP53.

In some embodiments, any of the variety of methods provided herein thatinclude detecting the presence of one or more members of a panel ofprotein biomarkers and the presence of aneuploidy in one or more samplesobtained from a subject further include detecting the presence of one ormore members of one or more additional classes of biomarkers.Non-limiting examples of such additional classes of biomarkers includes:copy number changes, DNA methylation changes, other nucleic acids (e.g.,mRNAs, miRNAs, lncRNAs, circRNA, mtDNA, telomeric DNA, translocation andgenomic rearrangements), peptides, and/or metabolites.

In some embodiments, the one or more additional classes of biomarkersinclude a metabolite biomarker. In some embodiments, a subject isdetermined to be at elevated risk of having or developing cancer if thebiological sample contains one or more metabolites indicative of cancer.In some embodiments, a subject is determined as having cancer if thebiological sample contains one or more metabolites indicative of cancer.Non-limiting examples of metabolites indicative of cancer include:5-methylthioadenosine (MTA), Glutathione reduced (GSH),N-acetylglutamate, Lactose, N-acetylneuraminate, UDP-acetylglucosamine,UDP-Acetylgalactosamine, UDP-glucuronate, Pantothenate, Arachidonate(20:4n6), Choline, Cytidine 5′-diphosphocholine, Dihomo-linolenate(20:3n3), Docosapentaenoate (DPA 22:5n3), Eicosapentaenoate (EPA20:5n3), Glycerophosphorylcholine (GPC), Docosahexaenoate (DHA 22:6n3),Linoleate (18:2n6), Cytidine 5′-monophosphate (5′-CMP),Gamma-glutamylglutamate, X-14577, X-11583, Isovalerylcarnitine,Phosphocreatine, 2-Aminoadipic acid, Gluconic acid, 0-Acetylcarnitine,aspartic acid, Deamido-NAD+, glutamic acid, Isobutyrylcarnitine,Carnitine, Pyridoxal, Citric acid, Adenosine, ATP, valine, XC0061,Isoleucine, γ-Butyrobetaine, Lactic acid, alanine, phenylalanine,Gluconolactone, leucine, Glutathione (GSSG) divalent, tyrosine, NAD+,XC0016, UTP, creatine, Theobromine, CTP, GTP, 3-Methylhistidine,Succinic acid, Glycerol 3-phosphate, glutamine, 5-Oxoproline, Thiamine,Butyrylcarnitine, 4-Acetamidobutanoic acid, UDP-Glucose, UDP-Galactose,threonine, N-Acetylglycine, proline, ADP, Choline, Malic acid,S-Adenosylmethionine, Pantothenic acid, Cysteinesulfinic acid,6-Aminohexanoic acid, Homocysteic acid, Hydroxyproline, Methioninesulfoxide, 3-Guanidinopropionic acid, Glucose 6-phosphate, Phenaceturicacid, Threonic acid, tryptophan, Pyridoxine, N-Acetylaspartic acid,4-Guanidinobutyric acid, serine, Citrulline, Betaine,N-Acetylasparagine, 2-Hydroxyglutaric acid, arginine, Glutathione (GSH),creatinine, Dihydroxyacetone phosphate, histidine, glycine, Glucose1-phosphate, N-Formylglycine, Ketoprofen, lysine, beta-alanine,N-Acetylglutamic acid, 2-Amino-2-(hydroxymethyl)-1,3-propanediol,Ornithine, Phosphorylcholine, Glycerophosphocholine, Terephthalic acid,Glyceraldehyde 3-phosphate, Gly-Asp, Taurine, Fructose 1,6-diphosphate,3-Aminoisobutyric acid, Spermidine, GABA, Triethanolamine, Glycerol,N-Acetylserine, N-Acetylornithine, Diethanolamine, AMP, Cysteineglutathione disulfide, Streptomycin sulfate+H2O divalent,trans-Glutaconic acid, Nicotinic acid, Isobutylamine, Betainealdehyde+H2O, Urocanic acid, 1-Aminocyclopropane-1-carboxylic acidHomoserinelactone, 5-Aminovaleric acid, 3-Hydroxybutyric acid,Ethanolamine, Isovaleric acid, N-Methylglutamic acid, Cystathionine,Spermine, Carnosine, 1-Methylnicotinamide, N-Acetylneuraminic acid,Sarcosine, GDP, N-Methylalanine, palmitic acid,1,2-dioleoyl-sn-glycero-3-phospho-rac-glycerolcholesterol 5α,6αepoxidelanosterol, lignoceric acid, 1oleoyl_rac_GL, cholesterol_epoxide,erucic acid, T-LCA, oleoyl-L-carnitine, oleanolic acid,3-phosphoglycerate, 5-hydroxynorvaline, 5-methoxytryptamine,adenosine-5-monophosphate, alpha-ketoglutarate, asparagine, benzoicacid, hypoxanthine, maltose, maltotriose, methionine sulfoxide,nornicotine, phenol, Phosphoethanolamine, pyrophosphate, pyruvic acid,quinic acid, taurine, uric acid, inosine, lactamide, 5-hydroxynorvalineNIST, cholesterol, deoxypentitol, 2-hydroxyestrone, 2-hydroxyestradiol,2-metholyestrone, 2-metholxyestradiol, 2-hydroxyestrone-3-methyl ether,4-hydroxyestrone, 4-metholxyestrone, 4-methoxyestradiol,16alpha-hydroxyestrone, 17-epiestriol, estriol, 16-Ketoestradiol,16-epiestriol, acylcarnitine C18:1, amino acids citrulline andtrans-4-hydroxyproline, glycerophospholipids PC aa C28:1, PC ae C30:0and PC ae C30:2, and sphingolipid SM (OH) C14:1. See e.g., Halama etal., Nesting of colon and ovarian cancer cells in the endothelial nicheis associated with alterations in glycan and lipid metabolism,Scientific Reports volume 7, Article number: 39999 (2017); Hur et al.,Systems approach to characterize the metabolism of liver cancer stemcells expressing CD133, Sci Rep., 7: 45557, doi: 10.1038/srep45557,(2017); Eliassen et al., Urinary Estrogens and Estrogen Metabolites andSubsequent Risk of Breast Cancer among Premenopausal Women, Cancer Res;72(3); 696-706 (2011); Gangi et al., Metabolomic profile in pancreaticcancer patients: a consensus-based approach to identify highlydiscriminating metabolites, Oncotarget, February 2; 7(5): 5815-5829(2016); Kumar et al., Serum and Plasma Metabolomic Biomarkers for LungCancer, Bioinformation, 13(6): 202-208, doi: 10.6026/97320630013202(2017); Schmidt et al., Pre-diagnostic metabolite concentrations andprostate cancer risk in 1077 cases and 1077 matched controls in theEuropean Prospective Investigation into Cancer and Nutrition, BMC Med.,15: 122, doi: 10.1186/s12916-017-0885-6 (2017); each of which isincorporated herein by reference in its entirety.

In some embodiments, the one or more additional classes of biomarkersinclude a peptide (e.g., a peptide that is distinct from the variousprotein biomarkers described herein as being useful in one or moremethods). In some embodiments, a subject is determined to be at elevatedrisk of having or developing cancer if the biological sample containsone or more peptides indicative of cancer. In some embodiments, asubject is determined as having cancer if the biological sample containsone or more peptides indicative of cancer. In some embodiments, apeptide is derived from a protein (e.g., the peptide includes an aminoacid sequence present in a protein biomarker or a different protein).Non-limiting examples of peptides indicative of cancer include thefollowing peptides and peptides derived from the following proteins:CEACAM, CYFRA21-1, CA125, PKLK, ProGRP, NSE, TPA 6, TPA 7, TPA 8, NRG,NRG 100, CNDP, APOB100, SCC, VEGF, EGFR, PIK3CA, HER2, BRAF, ROS, RET,NRAS, MET, MEK1, HER2, C4.4A, PSF3, FAM83B, ECD, CTNNB, VIM, S100A4,S100A7, COX2, MUC1, KLKB1, SAA, HP-β chain, C9, Pgrmc1, Ciz1,Transferrin, α-1 antitrypsin, apolipo protein 1, complement c3a,Caveolin-1, Kallikrein 6, Glucose regulated protein-8, adefensing-1,-2,-3, Serum C-peptide, Alpha-2-HS glycol protein, TrypticKRT 8 peptide, Plasma glycol protein, Catenin, Defensin α 6, MMPs,Cyclin D, S100 P, Lamin A/C filament protein, Heat shock protein,aldehyde dehydrogenase, Tx1-2, (thioredoxin like protein-2), P53, nm23,u-PA, VEGF, Eph B4, CRABP2, WT-1, Rab-3D, Mesothelin, ERα, ANXA4, PSAT1,SPB5, CEA5, CEA6, AlAT, SLPI, APOA4, VDBP, HE4, IL-1, -6, -7, -8, -10,-11, -12, -16, -18, -21, -23, -28A, -33, LIF, TNFR1-2, HVEM (TNFRSF14),IL1R-a, IL1R-b, IL-2R, M-CSF, MIP-la, TNF-α, CD40, RANTES, CD40L, MIF,IFN-β, MCP-4 (CCL13), MIG (CXCL9), MIP-1δ (CCL15), MIP3a (CCL20), MIP-4(CCL18), MPIF-1, SDF-1a+b (CXCL12), CD137/4-1BB, lymphotactin (XCL1),eotaxin-1 (CCL11), eotaxin-2 (CCL24), 6Ckine/CCL21), BLC (CXCL13), CTACK(CCL27), BCA-1 (CXCL13), HCC4 (CCL16), CTAP-3 (CXCL7), IGF1, VEGF,VEGFR3, EGFR, ErbB2, CTGF, PDGF AA, BB, PDGFRb, bFGF, TGFbRIII,β-cellulin, IGFBP1-4, 6, BDNF, PEDF, angiopoietin-2, renin,lysophosphatidic acid, β2-microglobulin, sialyl TN, ACE, CA 19-9, CEA,CA 15-3, CA-50, CA 72-4, OVX1, mesothelin, sialyl TN, MMP-2, -3, -7, -9,VAP-1, TIMP1-2, tenascin C, VCAM-1, osteopontin, KIM-1, NCAM,tetranectin, nidogen-2, cathepsin L, prostasin, matriptase, kallikreins2, 6, 10, cystatin C, claudin, spondin2, SLPI, bHCG, urinarygonadotropin peptide, inhibin, leptin, adiponectin, GH, TSH, ACTH, PRL,FSH, LH, cortisol, TTR, osteocalcin, insulin, ghrelin, GIP, GLP-1,amylin, glucagon, peptide YY, follistatin, hepcidin, CRP, Apo A1, CIII,H, transthyretin, SAA, SAP, complement C3,4, complement factor H,albumin, ceruloplasmin, haptoglobin, β-hemoglobin, transferrin,ferritin, fibrinogen, thrombin, von Willebrand factor, myoglobin,immunosuppressive acidic protein, lipid-associated sialic acid, S100A12(EN-RAGE), fetuin A, clusterin, α1-antitrypsin, a2-macroglobulin,serpin1 (human plasminogen activator inhibitor-1), Cox-1, Hsp27, Hsp60,Hsp80, Hsp90, lectin-type oxidized LDL receptor 1, CD14, lipocalin 2,ITIH4, sFasL, Cyfra21-1, TPA, perforin, DcR3, AGRP, creatine kinase-MB,human milk fat globule 1-2, NT-Pro-BNP, neuron-specific enolase, CASA,NB/70K, AFP, afamin, collagen, prohibitin, keratin-6, PARC, B7-H4,YK-L40, AFP-L3, DCP, GPC3, OPN, GP73, CK19, MDK, A2, 5-HIAA, CA15-3,CA19-9, CA27.29, CA72-4, calcitonin, CGA, BRAF V600E, BAP, BCT-ABLfusion protein, KIT, KRAS, PSA, Lactate dehydrogenase, NMP22, PAI-1,uPA, fibrin D-dimer, 5100, TPA, thyroglobulin, CD20, CD24, CD44,RS/DJ-1, p53, alpha-2-HS-glycoprotein, lipophilin B, beta-globin,hemopexin, UBE2N, PSMB6, PPP1CB, CPT2, COPA, MSK1/2, Pro-NPY,Secernin-1, Vinculin, NAAA, PTK7, TFG, MCCC2, TRAP1, IMPDH2, PTEN,POSTN, EPLIN, eIF4A3, DDAH1, ARG2, PRDX3&4, P4HB, YWHAG, EnoylCoA-hydrase, PHB, TUBB, KRT2, DES, HSP71, ATP5B, CKB, HSPD1, LMNA, EZH2,AMACR, FABP5, PPA2, EZR, SLP2, SM22, Bax, Smac/Diablo phosphorylatedBcl2, STAT3 and Smac/Diablo expression, PHB, PAP, AMACR, PSMA, FKBP4,PRDX4, KRT7/8/18, GSTP1, NDPK1, MTX2, GDF15, PCa-24, Caveolin-2,Prothrombin, Antithrombin-III, Haptoglobin, Serum amyloid A-1 protein,ZAG, ORM2, APOC3, CALML5, IGFBP2, MUC5AC, PNLIP, PZP, TIMP1, AMBP,inter-alpha-trypsin inhibitor heavy chain H1, inter-alpha-trypsininhibitor heavy chain H2, inter-alpha-trypsin inhibitor heavy chain H3,V-type proton ATPase subunit B, kidney isoform, Hepatocyte growthfactor-like protein, Serum amyloid P-component, Acylglycerol kinase,Leucine-rich repeat-containing protein 9, Beta-2-glycoprotein 1, Plasmaprotease C1 inhibitor, Lipoxygenase homology domain-containing protein1, Protocadherin alpha-13. See, e.g., Kuppusamy et al., Volume 24, Issue6, September 2017, Pages 1212-1221; Elzek and Rodland, Cancer MetastasisRev. 2015 March; 34(1): 83-96; Noel and Lokshin, Future Oncol. 2012January; 8(1): 55-71; Tsuchiya et al., World J Gastroenterol. 2015 Oct.7; 21(37): 10573-10583; Lou et al., Biomark Cancer. 2017; 9: 1-9; Parket al., Oncotarget. 2017 Jun. 27; 8(26): 42761-42771; Saraswat et al.,Cancer Med. 2017 July; 6(7): 1738-1751; Zamay et al., Cancers (Basel).2017 November; 9(11): 155; Tanase et al., Oncotarget. 2017 Mar. 14;8(11): 18497-18512, each of which is incorporated herein by reference inits entirety.

In some embodiments, the one or more additional classes of biomarkersinclude nucleic acid lesions or variations (e.g., a nucleic acid lesionor variation that is distinct from the various genetic biomarkersdescribed herein as being useful in one or more methods). In someembodiments, a subject is determined to be at elevated risk of having ordeveloping cancer if the biological sample contains one or more nucleicacid lesions or variations indicative of cancer. In some embodiments, asubject is determined as having cancer if the biological sample containsone or more nucleic acid lesions or variations indicative of cancer.Non-limiting examples of nucleic acid lesions or variations include copynumber changes, DNA methylation changes, and/or other nucleic acids(e.g., mRNAs, miRNAs, lncRNAs, circRNA, mtDNA, telomeric DNA,translocation and genomic rearrangements). Translocations and genomicrearrangements have been correlated with various cancers (e.g.,prostate, glioma, lung cancer, non-small cell lung cancer, melanoma, andthyroid cancer) and used as biomarkers for years (e.g., Demeure et al.,2014, World J Surg., 38:1296-305; Hogenbirk et al., 2016, PNAS USA,113:E3649-56; Gasi et al., 2011, PLoS One, 6:e16332; Ogiwara et al.,2008, Oncogene, 27:4788-97; U.S. Pat. Nos. 9,745,632; and 6,576,420). Inaddition, changes in copy number have been used as biomarkers forvarious cancers including, without limitation, head and neck squamouscell carcinoma, lymphoma (e.g., non-Hodgkin's lymphoma) and colorectalcancer (Kumar et al., 2017, Tumour Biol, 39:1010428317740296; Kumar etal., 2017, Tumour Biol., 39:1010428317736643; Henrique et al., 2014,Expert Rev. Mol. Diagn., 14:419-22; and U.S. Pat. No. 9,816,139). DNAmethylation and changes in DNA methylation (e.g., hypomethylation,hypermethylation) also are used as biomarkers in cancer. For example,hypomethylation has been associated with hepatocellular carcinoma (see,for example, Henrique et al., 2014, Expert Rev. Mol. Diagn., 14:419-22),esophageal carcinogenesis (see, for example, Alvarez et al., 2011, PLoSGenet., 7:e1001356) and gastric and liver cancer (see, for example, U.S.Pat. No. 8,728,732), and hypermethylation has been associated withcolorectal cancer (see, for example, U.S. Pat. No. 9,957,570;). Inaddition to genome-wide changes in methylation, specific methylationchanges within particular genes can be indicative of specific cancers(see, for example, U.S. Pat. No. 8,150,626). Li et al. (2012, J.Epidemiol., 22:384-94) provides a review of the association betweennumerous cancers (e.g., breast, bladder, gastric, lung, prostate, headand neck squamous cell, and nasopharyngeal) and aberrant methylation.Additionally or alternatively, additional types of nucleic acids orfeatures of nucleic acids have been associated with various cancers.Non-limiting examples of such nucleic acids or features of nucleic acidsinclude the presence or absence of various microRNAs (miRNAs) have beenused in the diagnosis of colon, prostate, colorectal, and ovariancancers (see, for example, D'Souza et al., 2018, PLos One, 13:e0194268;Fukagawa et al., 2017, Cancer Sci., 108:886-96; Giraldez et al., 2018,Methods Mol. Biol., 1768:459-74; U.S. Pat. Nos. 8,343,718; 9,410,956;and 9,074,206). For a review on the specific association of miR-22 withcancer, see Wang et al. (2017, Int. J. Oncol., 50:345-55); the abnormalexpression of long non-coding RNAs (lncRNAs) also have been used as abiomarker in cancers such as prostate cancer, colorectal cancer,cervical cancer, melanoma, non-small cell lung cancer, gastric cancer,endometrial carcinoma, and hepatocellular carcinoma (see, for example,Wang et al., 2017, Oncotarget, 8:58577086; Wang et al., 2018, Mol.Cancer, 17:110; Yu et al., 2018, Eur. Rev. Med. Pharmacol. Sci.,22:4812-9; Yu et al., 2018, Eur. Rev. Med. Pharmacol. Sci., 22:993-1002;Zhang et al., 2018, Eur. Rev. Med. Pharmacol. Sci., 22:4820-7; Zhang etal., 2018, Eur. Rev. Med. Pharmacol. Sci., 22:2304-9; Xie et al., 2018,EBioMedicine, 33:57-67; and U.S. Pat. No. 9,410,206); the presence orabsence of circular RNA (circRNA) has been used as a biomarker in lungcancer, breast cancer, gastric cancer, colorectal cancer, and livercancer (e.g., Geng et al., 2018, J. Hematol. Oncol., 11:98) and melanoma(e.g., Zhang et al., 2018, Oncol. Lett., 16:1219-25); changes intelomeric DNA (e.g., in length or in heterozygosity) or centromeric DNA(e.g., changes in expression of centromeric genes) also have beenassociated with cancers (e.g., prostate, breast, lung, lymphoma, andEwing's sarcoma) (see, for example, Baretton et al., 1994, Cancer Res.,54:4472-80; Liscia et al., 1999, Br. J. Cancer, 80:821-6; Proctor etal., 2009, Biochim. Biophys. Acta, 1792:260-74; and Sun et al., 2016,Int. J. Cancer, 139:899-907); various mutations (e.g., deletions),rearrangements and/or copy number changes in mitochondrial DNA (mtDNA)have been used prognostically and diagnostically for various cancers(e.g., prostate cancer, melanoma, breast cancer, lung cancer, andcolorectal cancer). See, for example, Maragh et al., 2015, CancerBiomark., 15:763-73; Shen et al., 2010, Mitochondrion, 10:62-68; Hosgoodet al., 2010, Carcinogen., 31:847-9; Thyagaraj an et al., 2012, CancerEpid. Biomarkers & Prev., 21:1574-81; and U.S. Pat. No. 9,745,632; andthe abnormal presence, absence or amount of messenger RNAs (mRNAs) alsohave been correlated with various cancers including, without limitation,breast cancer, Wilms' tumors, and cervical cancer (see, for example,Guetschow et al., 2012, Anal. Bioanaly. Chem., 404:399-406;Schwienbacher et al., 2000, Cancer Res., 60:1521-5; and Ngan et al.,1997, Genitourin Med., 73:54-8). Each of these citations is incorporatedherein by reference in its entirety.

Single Class of Biomarkers or Aneuploidy

In one aspect, provided herein are methods and materials for detectingthe presence of one or more members of a single class of biomarkers(e.g., genetic biomarkers or protein biomarkers) or the presence ofaneuploidy in one or more samples obtained from a subject. In anotheraspect, provided herein are methods and materials for diagnosing oridentifying the presence of a disease in a subject (e.g., identifyingthe subject as having cancer) by detecting the presence of one or moremembers of a single class of biomarkers (e.g., genetic biomarkers orprotein biomarkers) or the presence of aneuploidy in one or more samplesobtained from a subject. In another aspect, provided herein are methodsand materials for identifying a subject as being at risk (e.g.,increased risk) of having or developing a disease (e.g., cancer) bydetecting the presence of one or more members of a single class ofbiomarkers (e.g., genetic biomarkers or protein biomarkers) or thepresence of aneuploidy in one or more samples obtained from a subject.In another aspect, provided herein are methods and materials fortreating a subject who has been diagnosed or identified as having adisease (e.g., cancer) or who has been identified as being at risk(e.g., increased risk) of having or developing a disease (e.g., cancer)by detecting the presence of one or more members of a single class ofbiomarkers (e.g., genetic biomarkers or protein biomarkers) or thepresence of aneuploidy in one or more samples obtained from a subject.In another aspect, provided herein are methods and materials foridentifying a treatment for a subject who has been diagnosed oridentified as having a disease (e.g., cancer) or who has been identifiedas being at risk (e.g., increased risk) of having or developing adisease (e.g., cancer) by detecting the presence of one or more membersof a single class of biomarkers (e.g., genetic biomarkers or proteinbiomarkers) or the presence of aneuploidy in one or more samplesobtained from a subject. In another aspect, provided herein are methodsand materials for identifying a subject who will or is likely to respondto a treatment by detecting the presence of one or more members of asingle class of biomarkers (e.g., genetic biomarkers or proteinbiomarkers) or the presence of aneuploidy in one or more samplesobtained from a subject. In another aspect, provided herein are methodsand materials for identifying a subject as a candidate for furtherdiagnostic testing by detecting the presence of one or more members of asingle class of biomarkers (e.g., genetic biomarkers or proteinbiomarkers) or the presence of aneuploidy in one or more samplesobtained from a subject. In another aspect, provided herein are methodsand materials for identifying a subject as a candidate for increasedmonitoring by detecting the presence of one or more members of a singleclass of biomarkers (e.g., genetic biomarkers or protein biomarkers) orthe presence of aneuploidy in one or more samples obtained from asubject.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a single class of biomarkers (e.g.,genetic biomarkers or protein biomarkers) or the presence of aneuploidyin one or more samples obtained from a subject provide high sensitivityin the detection or diagnosis of cancer (e.g., a high frequency orincidence of correctly identifying a subject as having cancer). In someembodiments, methods provided herein that include detecting the presenceof one or more members of a single class of biomarkers (e.g., geneticbiomarkers or protein biomarkers) or the presence of aneuploidy in oneor more samples obtained from a subject provide a sensitivity of atleast about 70%, at least about 75%, at least about 80%, at least about85%, at least about 90%, at least about 91%, at least about 92%, atleast about 93%, at least about 94%, at least about 95%, at least about96%, at least about 97%, at least about 98%, at least about 99%, orhigher. In some embodiments, methods provided herein that includedetecting the presence of one or more members of a single class ofbiomarkers (e.g., genetic biomarkers or protein biomarkers) or thepresence of aneuploidy in one or more samples obtained from a subjectprovide high sensitivity in detecting a single type of cancer. In someembodiments, methods provided herein that include detecting the presenceof one or more members of a single class of biomarkers (e.g., geneticbiomarkers or protein biomarkers) or the presence of aneuploidy in oneor more samples obtained from a subject provide high sensitivity indetecting two or more types of cancers. Any of a variety of cancer typescan be detected using methods and materials provided herein (see, e.g.,the section entitled “Cancers”). In some embodiments, cancers that canbe detected using methods and materials that include detecting thepresence of one or more members of a single class of biomarkers (e.g.,genetic biomarkers or protein biomarkers) or the presence of aneuploidyin one or more samples obtained from a subject include liver cancer,ovarian cancer, esophageal cancer, stomach cancer, pancreatic cancer,colorectal cancer, lung cancer, or breast cancer. In some embodiments,cancers that can be detected using methods and materials that includedetecting the presence of one or more members of a single class ofbiomarkers (e.g., genetic biomarkers or protein biomarkers) or thepresence of aneuploidy in one or more samples obtained from a subjectinclude pancreatic cancer. In some embodiments, cancers that can bedetected using methods and materials that include detecting the presenceof one or more members of a single class of biomarkers (e.g., geneticbiomarkers or protein biomarkers) or the presence of aneuploidy in oneor more samples obtained from a subject include cancers of the femalereproductive tract (e.g., cervical cancer, endometrial cancer, ovariancancer, or fallopian tubal cancer). In some embodiments, cancers thatcan be detected using methods and materials that include detecting thepresence of one or more members of a single class of biomarkers (e.g.,genetic biomarkers or protein biomarkers) or the presence of aneuploidyin one or more samples obtained from a subject include bladder cancer orupper-tract urothelial carcinomas.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a single class of biomarkers (e.g.,genetic biomarkers or protein biomarkers) or the presence of aneuploidyin one or more samples obtained from a subject

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a single class of biomarkers (e.g.,genetic biomarkers or protein biomarkers) or the presence of aneuploidyin one or more samples obtained from a subject provide high specificityin the detection or diagnosis of cancer (e.g., a low frequency orincidence of incorrectly identifying a subject as having cancer whenthat subject does not have cancer). In some embodiments, methodsprovided herein that include detecting the presence of one or moremembers of a single class of biomarkers (e.g., genetic biomarkers orprotein biomarkers) or the presence of aneuploidy in one or more samplesobtained from a subject provide a specificity of at least about 70%, atleast about 75%, at least about 80%, at least about 85%, at least about90%, at least about 91%, at least about 92%, at least about 93%, atleast about 94%, at least about 95%, at least about 96%, at least about97%, at least about 98%, at least about 99%, or higher. As will beunderstood by those of ordinary skill in the art, a specificity of 99%means that only 1% of subjects that do not have cancer are incorrectlyidentified as having cancer. In some embodiments, methods providedherein that include detecting the presence of one or more members of asingle class of biomarkers (e.g., genetic biomarkers or proteinbiomarkers) or the presence of aneuploidy in one or more samplesobtained from a subject provide high specificity in detecting a singlecancer (e.g., there is a low probability of incorrectly identifying thatsubject as having that single cancer type). In some embodiments, methodsprovided herein that include detecting the presence of one or moremembers of a single class of biomarkers (e.g., genetic biomarkers orprotein biomarkers) or the presence of aneuploidy in one or more samplesobtained from a subject provide high specificity in detecting two ormore cancers (e.g., there is a low probability of incorrectlyidentifying that subject as having those two or more cancer types).

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a single class of biomarkers includedetecting the presence of one or more members of a panel of geneticbiomarkers.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers includedetecting one or more genetic biomarkers in one or more (e.g., 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16) of the following genes:NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS,KRAS, AKT1, TP53, PPP2R1A, and/or GNAS. In some embodiments, methodsprovided herein that include detecting the presence of one or moremembers of a panel of genetic biomarkers include detecting one or moregenetic biomarkers in each of the following genes: NRAS, CTNNB1, PIK3CA,FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53,PPP2R1A, and GNAS.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers includedetecting one or more genetic biomarkers in one or more (e.g., 1, 2, 3,or 4) of the following genes: KRAS (e.g., genetic biomarkers in codons12 and/or 61), TP53, CDKN2A, and/or SMAD4. In some embodiments, methodsprovided herein that include detecting the presence of one or moremembers of a panel of genetic biomarkers include detecting one or moregenetic biomarkers in each of the following genes: KRAS (e.g., geneticbiomarkers in codons 12 and/or 61), TP53, CDKN2A, and SMAD4.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers includedetecting one or more genetic biomarkers in one or more (e.g., 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18) of thefollowing genes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/orCDKN2A. In some embodiments, methods provided herein that includedetecting the presence of one or more members of a panel of geneticbiomarkers include detecting one or more genetic biomarkers in each ofthe following genes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, andCDKN2A. In some embodiments, methods provided herein that includedetecting the presence of one or more members of a panel of geneticbiomarkers include detecting one or more genetic biomarkers in one ormore (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12) of the followinggenes: PTEN, TP53, PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC,FBXW7, RNF43, and/or PPP2R1A. In some embodiments, methods providedherein that include detecting the presence of one or more members of apanel of genetic biomarkers include detecting one or more geneticbiomarkers in each of the following genes: PTEN, TP53, PIK3CA, PIK3R1,CTNNB1, KRAS, FGFR2, POLE, APC, FBXW7, RNF43, and PPP2R1A. In someembodiments, methods provided herein that include detecting the presenceof one or more members of a panel of genetic biomarkers includedetecting one or more genetic biomarkers in TP53

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of genetic biomarkers includedetecting one or more genetic biomarkers in one or more (e.g., 1, 2, 3,4, 5, 6, 7, 8, 9, or 10) of the following genes: TP53, PIK3CA, FGFR3,KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL. In some embodiments,methods provided herein that include detecting the presence of one ormore members of a panel of genetic biomarkers include detecting one ormore genetic biomarkers in each of the following genes: TP53, PIK3CA,FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and VHL.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a panel of protein biomarkers includedetecting one or more (e.g., 1, 2, 3, 4, 5, 6, 7, or 8) of the followingprotein biomarkers: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1,and/or myeloperoxidase (MPO). In some embodiments, methods providedherein that include detecting the presence of one or more members of apanel of protein biomarkers include detecting each of the followingprotein biomarkers: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, andmyeloperoxidase (MPO). In some embodiments, methods provided herein thatinclude detecting the presence of one or more members of a panel ofprotein biomarkers include detecting one or more (e.g., 1, 2, 3, 4, 5,6, 7, 8, 9, 10, or 11) of the following protein biomarkers: CA19-9, CEA,HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, and/orCA15-3. In some embodiments, methods provided herein that includedetecting the presence of one or more members of a panel of proteinbiomarkers include detecting each of the following protein biomarkers:CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin,G-CSF, and CA15-3. In some embodiments, methods provided herein thatinclude detecting the presence of one or more members of a panel ofprotein biomarkers include detecting one or more (e.g., 1, 2, 3, 4, 5,6, 7, 8, or 9) of the following protein biomarkers: CA19-9, CEA, HGF,OPN, CA125, AFP, prolactin, TIMP-1, and/or CA15-3. In some embodiments,methods provided herein that include detecting the presence of one ormore members of a panel of protein biomarkers include detecting each ofthe following protein biomarkers: CA19-9, CEA, HGF, OPN, CA125, AFP,prolactin, TIMP-1, and CA15-3. In some embodiments, methods providedherein that include detecting the presence of one or more members of apanel of protein biomarkers include detecting one or more (e.g., 1, 2,3, or 4) of the following protein biomarkers: CA19-9, CEA, HGF, and/orOPN. In some embodiments, methods provided herein that include detectingthe presence of one or more members of a panel of protein biomarkersinclude detecting each of the following protein biomarkers: CA19-9, CEA,HGF, and OPN.

In some embodiments, methods provided herein that include detecting thepresence of aneuploidy include detecting aneuploidy on one or more ofchromosome arms 5q, 8q, and/or 9p. In some embodiments, methods providedherein that include detecting the presence of aneuploidy includedetecting aneuploidy on one or more of chromosome arms 4p, 7q, 8q,and/or 9q.

In some embodiments, methods provided herein that include detecting thepresence of one or more members of a single class of biomarkers includedetecting the presence of one or more members of a class of biomarkersincluding, without limitation: copy number changes, DNA methylationchanges, other nucleic acids (e.g., mRNAs, miRNAs, lncRNAs, circRNA,mtDNA, telomeric DNA, translocation and genomic rearrangements),peptides, or metabolites.

In some embodiments, methods provided herein include detecting thepresence of one or more metabolites. In some embodiments, a subject isdetermined to be at elevated risk of having or developing cancer if thebiological sample contains one or more metabolites indicative of cancer.In some embodiments, a subject is determined as having cancer if thebiological sample contains one or more metabolites indicative of cancer.Non-limiting examples of metabolites indicative of cancer include:5-methylthioadenosine (MTA), Glutathione reduced (GSH),N-acetylglutamate, Lactose, N-acetylneuraminate, UDP-acetylglucosamine,UDP-Acetylgalactosamine, UDP-glucuronate, Pantothenate, Arachidonate(20:4n6), Choline, Cytidine 5′-diphosphocholine, Dihomo-linolenate(20:3n3), Docosapentaenoate (DPA 22:5n3), Eicosapentaenoate (EPA20:5n3), Glycerophosphorylcholine (GPC), Docosahexaenoate (DHA 22:6n3),Linoleate (18:2n6), Cytidine 5′-monophosphate (5′-CMP),Gamma-glutamylglutamate, X-14577, X-11583, Isovalerylcarnitine,Phosphocreatine, 2-Aminoadipic acid, Gluconic acid, 0-Acetylcarnitine,aspartic acid, Deamido-NAD+, glutamic acid, Isobutyrylcarnitine,Carnitine, Pyridoxal, Citric acid, Adenosine, ATP, valine, XC0061,Isoleucine, γ-Butyrobetaine, Lactic acid, alanine, phenylalanine,Gluconolactone, leucine, Glutathione (GSSG) divalent, tyrosine, NAD+,XC0016, UTP, creatine, Theobromine, CTP, GTP, 3-Methylhistidine,Succinic acid, Glycerol 3-phosphate, glutamine, 5-Oxoproline, Thiamine,Butyrylcarnitine, 4-Acetamidobutanoic acid, UDP-Glucose, UDP-Galactose,threonine, N-Acetylglycine, proline, ADP, Choline, Malic acid,S-Adenosylmethionine, Pantothenic acid, Cysteinesulfinic acid,6-Aminohexanoic acid, Homocysteic acid, Hydroxyproline, Methioninesulfoxide, 3-Guanidinopropionic acid, Glucose 6-phosphate, Phenaceturicacid, Threonic acid, tryptophan, Pyridoxine, N-Acetylaspartic acid,4-Guanidinobutyric acid, serine, Citrulline, Betaine,N-Acetylasparagine, 2-Hydroxyglutaric acid, arginine, Glutathione (GSH),creatinine, Dihydroxyacetone phosphate, histidine, glycine, Glucose1-phosphate, N-Formylglycine, Ketoprofen, lysine, beta-alanine,N-Acetylglutamic acid, 2-Amino-2-(hydroxymethyl)-1,3-propanediol,Ornithine, Phosphorylcholine, Glycerophosphocholine, Terephthalic acid,Glyceraldehyde 3-phosphate, Gly-Asp, Taurine, Fructose 1,6-diphosphate,3-Aminoisobutyric acid, Spermidine, GABA, Triethanolamine, Glycerol,N-Acetylserine, N-Acetylornithine, Diethanolamine, AMP, Cysteineglutathione disulfide, Streptomycin sulfate+H2O divalent,trans-Glutaconic acid, Nicotinic acid, Isobutylamine, Betainealdehyde+H2O, Urocanic acid, 1-Aminocyclopropane-1-carboxylic acidHomoserinelactone, 5-Aminovaleric acid, 3-Hydroxybutyric acid,Ethanolamine, Isovaleric acid, N-Methylglutamic acid, Cystathionine,Spermine, Carnosine, 1-Methylnicotinamide, N-Acetylneuraminic acid,Sarcosine, GDP, N-Methylalanine, palmitic acid,1,2-dioleoyl-sn-glycero-3-phospho-rac-glycerolcholesterol 5α,6αepoxidelanosterol, lignoceric acid, 1oleoyl_rac_GL, cholesterol_epoxide,erucic acid, T-LCA, oleoyl-L-carnitine, oleanolic acid,3-phosphoglycerate, 5-hydroxynorvaline, 5-methoxytryptamine,adenosine-5-monophosphate, alpha-ketoglutarate, asparagine, benzoicacid, hypoxanthine, maltose, maltotriose, methionine sulfoxide,nornicotine, phenol, Phosphoethanolamine, pyrophosphate, pyruvic acid,quinic acid, taurine, uric acid, inosine, lactamide, 5-hydroxynorvalineNIST, cholesterol, deoxypentitol, 2-hydroxyestrone, 2-hydroxyestradiol,2-metholyestrone, 2-metholxyestradiol, 2-hydroxyestrone-3-methyl ether,4-hydroxyestrone, 4-metholxyestrone, 4-methoxyestradiol,16alpha-hydroxyestrone, 17-epiestriol, estriol, 16-Ketoestradiol,16-epiestriol, acylcarnitine C18:1, amino acids citrulline andtrans-4-hydroxyproline, glycerophospholipids PC aa C28:1, PC ae C30:0and PC ae C30:2, and sphingolipid SM (OH) C14:1. See e.g., Halama etal., Nesting of colon and ovarian cancer cells in the endothelial nicheis associated with alterations in glycan and lipid metabolism,Scientific Reports volume 7, Article number: 39999 (2017); Hur et al.,Systems approach to characterize the metabolism of liver cancer stemcells expressing CD133, Sci Rep., 7: 45557, doi: 10.1038/srep45557,(2017); Eliassen et al., Urinary Estrogens and Estrogen Metabolites andSubsequent Risk of Breast Cancer among Premenopausal Women, Cancer Res;72(3); 696-706 (2011); Gangi et al., Metabolomic profile in pancreaticcancer patients: a consensus-based approach to identify highlydiscriminating metabolites, Oncotarget, February 2; 7(5): 5815-5829(2016); Kumar et al., Serum and Plasma Metabolomic Biomarkers for LungCancer, Bioinformation, 13(6): 202-208, doi: 10.6026/97320630013202(2017); Schmidt et al., Pre-diagnostic metabolite concentrations andprostate cancer risk in 1077 cases and 1077 matched controls in theEuropean Prospective Investigation into Cancer and Nutrition, BMC Med.,15: 122, doi: 10.1186/s12916-017-0885-6 (2017); each of which isincorporated herein by reference in its entirety.

In some embodiments, methods provided herein include detecting thepresence of one or more peptides (e.g., one or more peptides that aredistinct from the various protein biomarkers described herein as beinguseful in one or more methods). In some embodiments, a subject isdetermined to be at elevated risk of having or developing cancer if thebiological sample contains one or more peptides indicative of cancer. Insome embodiments, a subject is determined as having cancer if thebiological sample contains one or more peptides indicative of cancer. Insome embodiments, a peptide is derived from a protein (e.g., the peptideincludes an amino acid sequence present in a protein biomarker or adifferent protein). Non-limiting examples of peptides indicative ofcancer include the following peptides and peptides derived from thefollowing proteins: CEACAM, CYFRA21-1, CA125, PKLK, ProGRP, NSE, TPA 6,TPA 7, TPA 8, NRG, NRG 100, CNDP, APOB100, SCC, VEGF, EGFR, PIK3CA,HER2, BRAF, ROS, RET, NRAS, MET, MEK1, HER2, C4.4A, PSF3, FAM83B, ECD,CTNNB, VIM, S100A4, S100A7, COX2, MUC1, KLKB1, SAA, HP-β chain, C9,Pgrmc1, Ciz1, Transferrin, α-1 antitrypsin, apolipo protein 1,complement c3a, Caveolin-1, Kallikrein 6, Glucose regulated protein-8, adefensing-1,-2,-3, Serum C-peptide, Alpha-2-HS glycol protein, TrypticKRT 8 peptide, Plasma glycol protein, Catenin, Defensin α 6, MMPs,Cyclin D, S100 P, Lamin A/C filament protein, Heat shock protein,aldehyde dehydrogenase, Tx1-2, (thioredoxin like protein-2), P53, nm23,u-PA, VEGF, Eph B4, CRABP2, WT-1, Rab-3D, Mesothelin, ERα, ANXA4, PSAT1,SPB5, CEA5, CEA6, A1AT, SLPI, APOA4, VDBP, HE4, IL-1, -6, -7, -8, -10,-11, -12, -16, -18, -21, -23, -28A, -33, LIF, TNFR1-2, HVEM (TNFRSF14),IL1R-a, IL1R-b, IL-2R, M-CSF, MIP-la, TNF-α, CD40, RANTES, CD40L, MIF,IFN-β, MCP-4 (CCL13), MIG (CXCL9), MIP-1δ (CCL15), MIP3a (CCL20), MIP-4(CCL18), MPIF-1, SDF-1a+b (CXCL12), CD137/4-1BB, lymphotactin (XCL1),eotaxin-1 (CCL11), eotaxin-2 (CCL24), 6Ckine/CCL21), BLC (CXCL13), CTACK(CCL27), BCA-1 (CXCL13), HCC4 (CCL16), CTAP-3 (CXCL7), IGF1, VEGF,VEGFR3, EGFR, ErbB2, CTGF, PDGF AA, BB, PDGFRb, bFGF, TGFbRIII,β-cellulin, IGFBP1-4, 6, BDNF, PEDF, angiopoietin-2, renin,lysophosphatidic acid, J32-microglobulin, sialyl TN, ACE, CA 19-9, CEA,CA 15-3, CA-50, CA 72-4, OVX1, mesothelin, sialyl TN, MMP-2, -3, -7, -9,VAP-1, TIMP1-2, tenascin C, VCAM-1, osteopontin, KIM-1, NCAM,tetranectin, nidogen-2, cathepsin L, prostasin, matriptase, kallikreins2, 6, 10, cystatin C, claudin, spondin2, SLPI, bHCG, urinarygonadotropin peptide, inhibin, leptin, adiponectin, GH, TSH, ACTH, PRL,FSH, LH, cortisol, TTR, osteocalcin, insulin, ghrelin, GIP, GLP-1,amylin, glucagon, peptide YY, follistatin, hepcidin, CRP, Apo A1, CIII,H, transthyretin, SAA, SAP, complement C3,4, complement factor H,albumin, ceruloplasmin, haptoglobin, β-hemoglobin, transferrin,ferritin, fibrinogen, thrombin, von Willebrand factor, myoglobin,immunosuppressive acidic protein, lipid-associated sialic acid, S100A12(EN-RAGE), fetuin A, clusterin, α1-antitrypsin, α2-macroglobulin,serpin1 (human plasminogen activator inhibitor-1), Cox-1, Hsp27, Hsp60,Hsp80, Hsp90, lectin-type oxidized LDL receptor 1, CD14, lipocalin 2,ITIH4, sFasL, Cyfra21-1, TPA, perforin, DcR3, AGRP, creatine kinase-MB,human milk fat globule 1-2, NT-Pro-BNP, neuron-specific enolase, CASA,NB/70K, AFP, afamin, collagen, prohibitin, keratin-6, PARC, B7-H4,YK-L40, AFP-L3, DCP, GPC3, OPN, GP73, CK19, MDK, A2, 5-HIAA, CA15-3,CA19-9, CA27.29, CA72-4, calcitonin, CGA, BRAF V600E, BAP, BCT-ABLfusion protein, KIT, KRAS, PSA, Lactate dehydrogenase, NMP22, PAI-1,uPA, fibrin D-dimer, 5100, TPA, thyroglobulin, CD20, CD24, CD44,RS/DJ-1, p53, alpha-2-HS-glycoprotein, lipophilin B, beta-globin,hemopexin, UBE2N, PSMB6, PPP1CB, CPT2, COPA, MSK1/2, Pro-NPY,Secernin-1, Vinculin, NAAA, PTK7, TFG, MCCC2, TRAP1, IMPDH2, PTEN,POSTN, EPLIN, eIF4A3, DDAH1, ARG2, PRDX3&4, P4HB, YWHAG, EnoylCoA-hydrase, PHB, TUBB, KRT2, DES, HSP71, ATPSB, CKB, HSPD1, LMNA, EZH2,AMACR, FABP5, PPA2, EZR, SLP2, SM22, Bax, Smac/Diablo phosphorylatedBcl2, STAT3 and Smac/Diablo expression, PHB, PAP, AMACR, PSMA, FKBP4,PRDX4, KRT7/8/18, GSTP1, NDPK1, MTX2, GDF15, PCa-24, Caveolin-2,Prothrombin, Antithrombin-III, Haptoglobin, Serum amyloid A-1 protein,ZAG, ORM2, APOC3, CALML5, IGFBP2, MUC5AC, PNLIP, PZP, TIMP1, AMBP,inter-alpha-trypsin inhibitor heavy chain H1, inter-alpha-trypsininhibitor heavy chain H2, inter-alpha-trypsin inhibitor heavy chain H3,V-type proton ATPase subunit B, kidney isoform, Hepatocyte growthfactor-like protein, Serum amyloid P-component, Acylglycerol kinase,Leucine-rich repeat-containing protein 9, Beta-2-glycoprotein 1, Plasmaprotease C1 inhibitor, Lipoxygenase homology domain-containing protein1, Protocadherin alpha-13. See, e.g., Kuppusamy et al., Volume 24, Issue6, September 2017, Pages 1212-1221; Elzek and Rodland, Cancer MetastasisRev. 2015 March; 34(1): 83-96; Noel and Lokshin, Future Oncol. 2012January; 8(1): 55-71; Tsuchiya et al., World J Gastroenterol. 2015 Oct.7; 21(37): 10573-10583; Lou et al., Biomark Cancer. 2017; 9: 1-9; Parket al., Oncotarget. 2017 Jun. 27; 8(26): 42761-42771; Saraswat et al.,Cancer Med. 2017 July; 6(7): 1738-1751; Zamay et al., Cancers (Basel).2017 November; 9(11): 155; Tanase et al., Oncotarget. 2017 Mar. 14;8(11): 18497-18512, each of which is incorporated herein by reference inits entirety.

In some embodiments, methods provided herein include detecting thepresence of one or more nucleic acid lesions or variations (e.g., one ormore nucleic acid lesions or variations that are distinct from thevarious genetic biomarkers described herein as being useful in one ormore methods). In some embodiments, a subject is determined to be atelevated risk of having or developing cancer if the biological samplecontains one or more nucleic acid lesions or variations indicative ofcancer. In some embodiments, a subject is determined as having cancer ifthe biological sample contains one or more nucleic acid lesions orvariations indicative of cancer. Non-limiting examples of nucleic acidlesions or variations include copy number changes, DNA methylationchanges, and/or other nucleic acids (e.g., mRNAs, miRNAs, lncRNAs,circRNA, mtDNA, telomeric DNA, translocation and genomicrearrangements). Translocations and genomic rearrangements have beencorrelated with various cancers (e.g., prostate, glioma, lung cancer,non-small cell lung cancer, melanoma, and thyroid cancer) and used asbiomarkers for years (e.g., Demeure et al., 2014, World J Surg.,38:1296-305; Hogenbirk et al., 2016, PNAS USA, 113:E3649-56; Gasi etal., 2011, PLoS One, 6:e16332; Ogiwara et al., 2008, Oncogene,27:4788-97; U.S. Pat. Nos. 9,745,632; and 6,576,420). In addition,changes in copy number have been used as biomarkers for various cancersincluding, without limitation, head and neck squamous cell carcinoma,lymphoma (e.g., non-Hodgkin's lymphoma) and colorectal cancer (Kumar etal., 2017, Tumour Biol, 39:1010428317740296; Kumar et al., 2017, TumourBiol., 39:1010428317736643; Henrique et al., 2014, Expert Rev. Mol.Diagn., 14:419-22; and U.S. Pat. No. 9,816,139). DNA methylation andchanges in DNA methylation (e.g., hypomethylation, hypermethylation)also are used as biomarkers in cancer. For example, hypomethylation hasbeen associated with hepatocellular carcinoma (see, for example,Henrique et al., 2014, Expert Rev. Mol. Diagn., 14:419-22), esophagealcarcinogenesis (see, for example, Alvarez et al., 2011, PLoS Genet.,7:e1001356) and gastric and liver cancer (see, for example, U.S. Pat.No. 8,728,732), and hypermethylation has been associated with colorectalcancer (see, for example, U.S. Pat. No. 9,957,570;). In addition togenome-wide changes in methylation, specific methylation changes withinparticular genes can be indicative of specific cancers (see, forexample, U.S. Pat. No. 8,150,626). Li et al. (2012, J. Epidemiol.,22:384-94) provides a review of the association between numerous cancers(e.g., breast, bladder, gastric, lung, prostate, head and neck squamouscell, and nasopharyngeal) and aberrant methylation. Additionally oralternatively, additional types of nucleic acids or features of nucleicacids have been associated with various cancers. Non-limiting examplesof such nucleic acids or features of nucleic acids include the presenceor absence of various microRNAs (miRNAs) have been used in the diagnosisof colon, prostate, colorectal, and ovarian cancers (see, for example,D'Souza et al., 2018, PLos One, 13:e0194268; Fukagawa et al., 2017,Cancer Sci., 108:886-96; Giraldez et al., 2018, Methods Mol. Biol.,1768:459-74; U.S. Pat. Nos. 8,343,718; 9,410,956; and 9,074,206). For areview on the specific association of miR-22 with cancer, see Wang etal. (2017, Int. J. Oncol., 50:345-55); the abnormal expression of longnon-coding RNAs (lncRNAs) also have been used as a biomarker in cancerssuch as prostate cancer, colorectal cancer, cervical cancer, melanoma,non-small cell lung cancer, gastric cancer, endometrial carcinoma, andhepatocellular carcinoma (see, for example, Wang et al., 2017,Oncotarget, 8:58577086; Wang et al., 2018, Mol. Cancer, 17:110; Yu etal., 2018, Eur. Rev. Med. Pharmacol. Sci., 22:4812-9; Yu et al., 2018,Eur. Rev. Med. Pharmacol. Sci., 22:993-1002; Zhang et al., 2018, Eur.Rev. Med. Pharmacol. Sci., 22:4820-7; Zhang et al., 2018, Eur. Rev. Med.Pharmacol. Sci., 22:2304-9; Xie et al., 2018, EBioMedicine, 33:57-67;and U.S. Pat. No. 9,410,206); the presence or absence of circular RNA(circRNA) has been used as a biomarker in lung cancer, breast cancer,gastric cancer, colorectal cancer, and liver cancer (e.g., Geng et al.,2018, J. Hematol. Oncol., 11:98) and melanoma (e.g., Zhang et al., 2018,Oncol. Lett., 16:1219-25); changes in telomeric DNA (e.g., in length orin heterozygosity) or centromeric DNA (e.g., changes in expression ofcentromeric genes) also have been associated with cancers (e.g.,prostate, breast, lung, lymphoma, and Ewing's sarcoma) (see, forexample, Baretton et al., 1994, Cancer Res., 54:4472-80; Liscia et al.,1999, Br. J. Cancer, 80:821-6; Proctor et al., 2009, Biochim. Biophys.Acta, 1792:260-74; and Sun et al., 2016, Int. J. Cancer, 139:899-907);various mutations (e.g., deletions), rearrangements and/or copy numberchanges in mitochondrial DNA (mtDNA) have been used prognostically anddiagnostically for various cancers (e.g., prostate cancer, melanoma,breast cancer, lung cancer, and colorectal cancer). See, for example,Maragh et al., 2015, Cancer Biomark., 15:763-73; Shen et al., 2010,Mitochondrion, 10:62-68; Hosgood et al., 2010, Carcinogen., 31:847-9;Thyagaraj an et al., 2012, Cancer Epid. Biomarkers & Prev., 21:1574-81;and U.S. Pat. No. 9,745,632; and the abnormal presence, absence oramount of messenger RNAs (mRNAs) also have been correlated with variouscancers including, without limitation, breast cancer, Wilms' tumors, andcervical cancer (see, for example, Guetschow et al., 2012, Anal.Bioanaly. Chem., 404:399-406; Schwienbacher et al., 2000, Cancer Res.,60:1521-5; and Ngan et al., 1997, Genitourin Med., 73:54-8). Each ofthese citations is incorporated herein by reference in its entirety.

Validation of Detected Genetic Biomarkers

In some embodiments, methods provided herein can be used to verify thata genetic biomarker detected in circulating tumor DNA present incell-free DNA indicates the presence of a cancer cell in the subject. Insome embodiments, methods provided herein can be used to verify that agenetic alteration (e.g., one or more genetic alterations) detected incirculating tumor DNA present in cell-free DNA indicates the presence ofa cancer cell in the subject. For example, certain genetic biomarkers(e.g., genetic alterations) that are present in cancer cells also occurin other non-cancer cells in the body. Such non-cancer cells include,without limitation, white blood cell clones arising duringage-associated clonal hematopoiesis (e.g., clonal hematopoieticexpansion (also known as clonal hematopoiesis of indeterminate potentialor CHIP) or myelodysplasia). As a result, such clones, which mayrepresent early forms of myelodysplasia, are a potential source of falsepositives in ctDNA-based assays. In such cases, detecting a geneticbiomarker (e.g., a genetic alteration) in cell-free DNA can lead to afalse diagnosis of cancer since the genetic biomarker (e.g., geneticalteration) arises from hematopoietic white blood cells, rather thanfrom a cancer (e.g., a solid tumor). Methods provided herein can reduceor eliminate such false cancer diagnoses.

Methods provided herein can be used to reduce or eliminate false cancerdiagnoses by determining whether one or more genetic biomarkers (e.g.,genetic alterations) detected in cell-free DNA originate fromhematopoietic white blood cells rather than from a cancer cell. Forexample, DNA can be isolated or obtained from white blood cells of asubject, which DNA can be tested to determine the presence or absence ofa genetic biomarker (e.g., a genetic alteration) that was identified incell-free DNA from the subject, which genetic biomarker (e.g., geneticalteration) is associated with cancer. In some embodiments, if thegenetic biomarker (e.g., genetic alteration) is identified in the DNAfrom a white blood cell, it is indicative that the genetic biomarker(e.g., genetic alteration) identified in cell-free DNA originated fromthe white blood cells, and not from a cancer cell present in thesubject. In some embodiments, if the genetic biomarker (e.g., geneticalteration) is not identified in the DNA from the white blood cells, itis indicative that the genetic biomarker (e.g., genetic alteration)identified in cell-free DNA originated from a cancer cell present in thesubject, and not from a white blood cell. Methods of testing DNAisolated or obtained from white blood cells for the presence or absenceof a genetic biomarker (e.g., a genetic mutation) that is associatedwith cancer in order to determine whether that genetic biomarker (e.g.,genetic alteration) originates from a cancer cell in the subject aregenerically described herein as “verifying a genetic alteration againstwhite blood cells”, “verifying a genetic alteration against DNA fromwhite blood cells”, “white blood cell verification”, and similarphrases.

Any genetic biomarker (e.g., genetic alteration) that is associated withcancer can be verified using methods described herein. Examples ofgenetic biomarkers (e.g., genes having genetic alterations) associatedwith cancer include, without limitation, ABCA7, ABL1, ABL2, ACVR1B,ACVR2A, AJUBA, AKT1, AKT2, ALB, ALDOB, ALK, AMBRA1, AMER1, AMOT,ANKRD46, APC, AR, ARHGAP35, ARHGEF12, ARID1A, ARID1B, ARID2, ARID4B,ARL15, ARMCX1, ASXL1, ASXL2, ATAD2, ATF1, ATG14, ATG5, ATM, ATRX, ATXN2,AXIN1, B2M, BAP1, BCL11A, BCL11B, BCL2, BCL3, BCL6, BCL9, BCLAF1, BCOR,BCR, BIRC6, BIRC8, BLM, BLVRA, BMPR1A, BRAF, BRCA1, BRCA2, BRD7, BRE,BRWD3, BTBD7, BTRC, C11orf70, C12orf57, C2CD5, C3orf62, C8orf34, CAMKV,CAPG, CARD11, CARS, CASP8, CBFA2T3, CBFB, CBLC, CBX4, CCAR1, CCDC117,CCDC88A, CCM2, CCNC, CCND1, CCND2, CCND3, CCR3, CD1D, CD79B, CDC73,CDCP1, CDH1, CDH11, CDK12, CDK4, CDK6, CDKN1A, CDKN1B, CDKN2A, CDX2,CEBPA, CELF1, CENPB, CEP128, CHD2, CHD4, CHD8, CHEK2, CHRDL1, CHUK, CIC,CLEC4C, CMTR2, CNN2, CNOT1, CNOT4, COL11A1, COPS4, COX7B2, CREB1,CREBBP, CSDE1, CSMD3, CTCF, CTDNEP1, CTNNB1, CUL1, CUL2, CYB5B, CYLD,DACH1, DCHS1, DCUN1D1, DDB2, DDIT3, DDX3X, DDX5, DDX6, DEK, DHX15,DHX16, DICER1, DIRC2, DIS3, DIXDC1, DKK2, DNAJB5, DNER, DNM1L, DNMT3A,EED, EGFR, EIF1AX, EIF2AK3, EIF2S2, EIF4A1, EIF4A2, ELF3, ELK4, EMG1,EMR3, EP300, EPB41L4A, EPHA2, EPS8, ERBB2, ERBB3, ERRFI1, ETV4, ETV6,EVI1, EWSR1, EXO5, EXT1, EXT2, EZH2, F5, FANCM, FAT1, FBN2, FBXW7,FCER1G, FEV, FGF2, FGFR1, FGFR1OP, FGFR2, FGFR3, FH, FLT3, FN1, FOXA1,FOXP1, FUBP1, FUS, GALNTL5, GATA3, GGCT, GIGYF2, GK2, GLIPR2, GNAS,GNPTAB, GNRHR, GOLGA5, GOLM1, GOPC, GOT2, GPC3, GPS2, GPX7, GRK1, GSE1,GZMA, HDAC1, HERC1, HERC4, HGF, HIST1H2BO, HLA-A, HLA-B, HMCN1, HMGA1,HMGA2, HNRNPA1, HRAS, HSP90AB1, ID3, IDH1, IDH2, IFNGR2, IFT88, IKZF2,IL2, INO80C, INPP4A, INPPL1, IRF4, IWS1, JAK1, JAK2, JUN, KANSL1, KATE,KATNAL1, KBTBD7, KCNMB4, KDM5C, KDM6A, KEAP1, KIAA1467, KIT, KLF4,KMT2A, KMT2B, KMT2C, KMT2D, KMT2E, KRAS, KRT15, LAMTOR1, LARP4B, LCK,LMO2, LPAR2, LYN, MAF, MAFB, MAML2, MAP2K1, MAP2K2, MAP2K4, MAP3K1,MAP4K3, MAPK1, MAX, MB21D2, MBD1, MBD6, MBNL1, MBNL3, MDM2, MDM4, MED12,MED23, MEN1, MET, MGA, MITF, MKLN1, MLH1, MLL, MLLT4, MOAP1, MORC4, MPL,MS4A1, MSH2, MSI1, MTOR, MYB, MYC, MYCL1, MYCN, MYD88, MYL6, MYO1B,MYO6, NAA15, NAA25, NAP1L2, NAP1L4, NCOA2, NCOA4, NCOR1, NEK9, NF1, NF2,NFE2L2, NFE2L3, NFKB2, NIPBL, NIT1, NKX3-1, NME4, NOTCH1, NOTCH2, NPM1,NR4A3, NRAS, NSD1, NTRK1, NUP214, NUP98, PALB2, PAX8, PBRM1, PCBP1,PCOLCE2, PDGFB, PHF6, PIK3CA, PIK3CB, PIK3R1, PIM1, PLAG1, PML, POLA2,POT1, PPARD, PPARG, PPM1D, PPP2R1A, PPP6C, PRKACA, PRKCI, PRPF40A,PSIP1, PTEN, PTH2, PTMS, PTN, PTPN11, RAB18, RAC1, RAF1, RANBP3L,RAPGEF6, RASA1, RB1, RBBP6, RBM10, RBM26, RC3H2, REL, RERE, RET, RFC4,RHEB, RHOA, RIMS2, RIT1, RNF111, RNF43, ROS1, RPL11, RPL5, RQCD1, RRAS2,RUNX1, RXRA, SARM1, SCAF11, SDHB, SDHD, SEC22A, SENP3, SENP8, SETD1B,SETD2, SF3A3, SF3B1, SFPQ, SIN3A, SKAP2, SMAD2, SMAD3, SMAD4, SMARCA4,SMARCB1, SMARCC2, SMO, SNCB, SOCS1, SOS1, SOX4, SOX9, SP3, SPEN, SPOP,SPSB2, SS18, STAG2, STK11, STK31, SUFU, SUFU, SUZ12, SYK, TAF1A, TARDBP,TAS2R30, TBL1XR1, TBX3, TCF12, TCF3, TCF7L2, TCL1A, TET2, TEX11, TFDP2,TFG, TGFBR2, THRAP3, TLX1, TM9SF1, TMCO2, TMED10, TMEM107, TMEM30A,TMPO, TNFAIP3, TNFRSF9, TNRC6B, TP53, TP53BP1, TPR, TRAF3, TRIMS,TRIP12, TSC1, TSC2, TTK, TTR, TUBA3C, U2AF1, UBE2D3, UBR5, UNC13C, UNKL,UPP1, USO1, USP28, USP6, USP9X, VHL, VN1R2, VPS33B, WAC, WDR33, WDR47,WRN, WT1, WWP1, XPO1, YOD1, ZC3H13, ZDHHC4, ZFHX3, ZFP36L1, ZFP36L2,ZGRF1, ZMYM3, ZMYM4, ZNF234, ZNF268, ZNF292, ZNF318, ZNF345, ZNF600,ZNF750, and/or ZNF800. In some embodiments, genetic biomarkers (e.g.,genes having genetic alterations) associated with cancer that can beverified against white blood cell DNA isolated or obtained from asubject include tumor suppressor genes or oncogenes. In someembodiments, one or more codons and/or their surrounding splice sitescan be tested according to methods disclosed herein. Exemplary codons oftumor suppressor genes and oncogenes which may be tested include,without limitation, one or more of the following codons and theirsurrounding splice sites: codons 16-18 of AKT1; codons 1304-1311,1450-1459 of APC; codons 591-602 of BRAF; codons 51-58, 76-88 of CDKN2A;codons 31-39, 38-47 of CTNNB1; codons 856-868 of EGFR; codons 361-371,464-473, 473-483, 498-507 of FBXW7; codons 250-256 of FGFR2; codons199-208 of GNAS; codons 7-19 of HRAS; codons 7-14, 57-65, 143-148 ofKRAS; codons 3-15, 54-63 of NRAS; codons 80-90, 343-348, 541-551,1038-1050 of PIK3CA; codons 175-187 of PPP2R1A; codons 90-98, 125-132,133-146, 145-154 of PTEN; and codons 10-22, 25-32, 33-40, 40-52, 52-64,82-94, 97-110, 112-125, 123-125, 126-132, 132-142, 150-163, 167-177,175-186, 187-195, 195-206, 207-219, 219-224, 226-237, 232-245, 248-261,261-268, 272-283, 279-290, 298-307, 307-314, 323-331, 333-344, 344-355,367-375, 374-386 of TP53. In some embodiments, genetic biomarkers (e.g.,genes having genetic alterations) associated with cancer that can beverified against white blood cell DNA isolated or obtained from asubject include one or more of the genetic biomarkers (e.g., geneticalterations) identified in Table 11 or Table 12. Testing DNA isolated orobtained from white blood cells may include amplification and/orsequencing of such DNA (e.g., using any of the methods described hereinincluding, without limitation, Safe-SeqS methods). Using a techniquesuch as Safe-SeqS provides corresponding advantages when testing DNAisolated or obtained from white blood cells as it does when testingcell-free DNA from the subject.

In some embodiments, a single genetic biomarker (e.g., geneticalteration) is verified against DNA isolated or obtained from whiteblood cells. In some embodiments, more than one genetic biomarkers(e.g., genetic alterations) are verified against DNA isolated orobtained from white blood cells. For example, 2, 3, 4, 5, 6, 7, 8, 9, 10or more genetic biomarkers (e.g., genetic alterations) can be verifiedagainst DNA isolated or obtained from white blood cells using methodsdescribed herein. In some embodiments, one or more genetic biomarkers(e.g., genetic alterations) are verified against DNA isolated orobtained from white blood cells using a plurality of samples that areisolated or obtained from white blood cells. For example, one or moregenetic biomarkers (e.g., genetic alterations) can be verified againstDNA isolated or obtained from white blood cells by isolating DNA fromtwo white blood cell samples isolated or obtained from the subject.Verifying genetic biomarkers (e.g., genetic alterations) against DNAisolated or obtained from white blood cells using a plurality of samplescan increase the sensitivity of the testing, thus leading to a moreaccurate diagnosis.

In some embodiments, one or more genetic biomarkers (e.g., geneticalterations) can be determined to originate from a cancer cell in thesubject (or not) by verifying the genetic biomarker(s) (e.g., geneticalteration(s)) against DNA isolated or obtained from white blood cellsin the absence of additional diagnostic testing methods. In someembodiments, one or more genetic biomarkers (e.g., genetic alterations)can be determined to originate from a cancer cell in the subject (ornot) by verifying the genetic biomarker(s) (e.g., genetic alteration(s))against DNA isolated or obtained from white blood cells in combinationwith additional diagnostic testing methods. In some embodiments, suchadditional diagnostic testing methods can include one or more of thediagnostic testing methods described herein. In some embodiments, suchadditional diagnostic testing methods can include testing a proteinbiomarker (e.g., one or more of the protein biomarkers disclosedherein). In some embodiments, such additional diagnostic testing methodscan include testing a protein biomarker (e.g., one or more of theprotein biomarkers disclosed herein) at a certain threshold level.Examples of protein biomarkers that can be combined with white bloodcell verification include, without limitation, carbohydrate antigen 19-9(CA19-9), carcinoembryonic antigen (CEA), hepatocyte growth factor(HGF), osteopontin (OPN), CA125, AFP, prolactin, TIMP-1, follistatin,G-CSF, and CA15-3. Any of the variety of threshold levels for suchprotein biomarkers disclosed herein can be used in combination withwhite blood cell verification of genetic biomarker(s) (e.g., geneticalteration(s)) found in cell-free DNA. Exemplary and non-limitingthreshold levels for certain protein biomarkers include: CA19-9 (>92U/ml), CEA (>7,507 pg/ml), CA125 (>577 U/ml), AFP (>21,321 pg/ml),Prolactin (>145,345 pg/ml), HGF (>899 pg/ml), OPN (>157,772 pg/ml),TIMP-1 (>176,989 pg/ml), Follistatin (>1,970 pg/ml), G-CSF (>800 pg/ml),and CA15-3 (>98 U/ml). In some embodiments, threshold levels for proteinbiomarkers can be higher (e.g., about 10%, about 20%, about 30%, about40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 100%,or higher) than the exemplary threshold levels described herein. In someembodiments, threshold levels for protein biomarkers can be lower (e.g.,about 10%, about 20%, about 30%, about 40%, about 50%, or lower) thanthe exemplary threshold levels described herein. In certain embodiments,a testing a single protein biomarker is combined with white blood cellverification of genetic biomarker(s) (e.g., genetic alteration(s)))found in cell-free DNA. In certain embodiments, testing more than oneprotein biomarker (two, three, four, five, six, seven, eight, nine, ten,eleven, or more protein biomarkers) is combined with white blood cellverification of genetic biomarker(s) (e.g., genetic alteration(s)) foundin cell-free DNA.

In some embodiments, a plurality of genetic biomarkers (e.g., a panel ofgenetic biomarkers) is verified against DNA isolated or obtained fromwhite blood cells. In some embodiments, a plurality of geneticbiomarkers that is verified against DNA isolated or obtained from whiteblood cells includes one or more of NRAS, CTNNB1, PIK3CA, FBXW7, APC,EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, orGNAS. In some embodiments, a plurality of genetic biomarkers that isverified against DNA isolated or obtained from white blood cellsincludes each of NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A,PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and GNAS. In someembodiments, one or more genetic biomarkers (e.g., genetic alterations)that include one or more of (e.g., each of) NRAS, CTNNB1, PIK3CA, FBXW7,APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A,or GNAS can be determined to originate from a cancer cell in the subject(or not) by verifying the genetic biomarker(s) against DNA isolated orobtained from white blood cells in combination with additionaldiagnostic testing methods. Such additional diagnostic testing methodsinclude, without limitation, testing for the presence of one or moreprotein biomarkers and/or for the presence of aneuploidy. In someembodiments, the one or more protein biomarkers can be one or more of(e.g., each of) CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, ormyeloperoxidase (MPO). In some embodiments, the one or more proteinbiomarkers can be one or more of (e.g., each of) CA19-9, CEA, HGF, OPN,CA125, AFP, prolactin, TIMP-1, or CA15-3. In some embodiments, the oneor more protein biomarkers can be one or more of (e.g., each of) CA19-9,CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, orCA15-3.

In some embodiments, a plurality of genetic biomarkers that is verifiedagainst DNA isolated or obtained from white blood cells includes one ormore of KRAS, TP53, CDKN2A, or SMAD4. In some embodiments, a pluralityof genetic biomarkers that is verified against DNA isolated or obtainedfrom white blood cells includes each of KRAS, TP53, CDKN2A, and SMAD4.In some embodiments, one or more genetic biomarkers (e.g., geneticalterations) that include one or more of (e.g., each of) KRAS, TP53,CDKN2A, or SMAD4 can be determined to originate from a cancer cell inthe subject (or not) by verifying the genetic biomarker(s) against DNAisolated or obtained from white blood cells in combination withadditional diagnostic testing methods. Such additional diagnostictesting methods include, without limitation, testing for the presence ofone or more protein biomarkers and/or for the presence of aneuploidy. Insome embodiments, the one or more protein biomarkers can be one or moreof (e.g., each of) CA19-9, CEA, HGF, or OPN.

In some embodiments, a plurality of genetic biomarkers that is verifiedagainst DNA isolated or obtained from white blood cells includes one ormore of NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A,MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, or CDKN2A. Insome embodiments, a plurality of genetic biomarkers that is verifiedagainst DNA isolated or obtained from white blood cells includes each ofNRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1,CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and CDKN2A. In someembodiments, one or more genetic biomarkers (e.g., genetic alterations)that include one or more of (e.g., each of) NRAS, PTEN, FGFR2, KRAS,POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1,APC, EGFR, BRAF, or CDKN2A can be determined to originate from a cancercell in the subject (or not) by verifying the genetic biomarker(s)against DNA isolated or obtained from white blood cells in combinationwith additional diagnostic testing methods. Such additional diagnostictesting methods include, without limitation, testing for the presence ofone or more protein biomarkers and/or for the presence of aneuploidy(e.g., aneuploidy in one or more chromosomes or chromosomal arms thatare associated with the presence of cancer). In some embodiments, thepresence of aneuploidy can be detected on one or more of chromosomalarms 4p, 7q, 8q, or 9q.

In some embodiments, a plurality of genetic biomarkers that is verifiedagainst DNA isolated or obtained from white blood cells includes one ormore of TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, orVHL. In some embodiments, a plurality of genetic biomarkers that isverified against DNA isolated or obtained from white blood cellsincludes each of TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS,MET, and VHL. In some embodiments, one or more genetic biomarkers (e.g.,genetic alterations) that include one or more of (e.g., each of) TP53,PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, or VHL CDKN2A can bedetermined to originate from a cancer cell in the subject (or not) byverifying the genetic biomarker(s) against DNA isolated or obtained fromwhite blood cells in combination with additional diagnostic testingmethods. Such additional diagnostic testing methods include, withoutlimitation, testing for the presence of one or more protein biomarkersand/or for the presence of aneuploidy (e.g., aneuploidy in one or morechromosomes or chromosomal arms that are associated with the presence ofcancer). In some embodiments, the presence of aneuploidy can be detectedon one or more of chromosomal arms 5q, 8q, or 9p.

In some embodiments, a sample isolated or obtained from the subject thatis used to isolate or obtain white blood cell DNA can be the same as thesample isolated or obtained from the subject that is used for testinggenetic biomarkers in cell-free DNA and/or protein biomarkers. Forexample, the sample can be a blood sample (e.g., whole blood), whichblood sample is subsequently separated into a plasma fraction and awhite blood cell fraction. Such separation can be achieved, for example,by density gradient centrifugation in which plasma is separated fromwhite blood cells, which are typically found in the buffy coat. Aftersuch separation, DNA from the white blood cells in the buffy coat can beisolated and tested, while genetic biomarkers in cell-free DNA from theplasma fraction can also be isolated and tested. In some embodiments,the plasma fraction is allowed to clot in order to obtain a serumfraction. In some embodiments the sample is frozen, refrigerated, orotherwise stored prior to and/or after testing and/or fractionation. Insome embodiments, one or more fractions from the sample are frozen,refrigerated, or otherwise stored prior to and/or after testing.

In some embodiments, a sample isolated or obtained from the subject thatis used to isolate or obtain white blood cell DNA can be different fromthe sample isolated or obtained from the subject that is used fortesting genetic biomarkers in cell-free DNA and/or protein biomarkers.For example, a first sample can be isolated or obtained from thesubject, which first sample is used for testing genetic biomarkers incell-free DNA and/or protein biomarkers. Prior to, simultaneously, orafter the first sample is isolated or obtained, a second sample can beisolated or obtained from the subject, which second sample is used toisolate or obtain DNA from white blood cells for verifying one or moregenetic biomarkers (e.g., genetic alterations) identified in cell-freeDNA. The second sample can be fractionated as described herein (e.g., bydensity gradient centrifugation). In some embodiments the first and/orsecond sample (or fractions thereof) can be frozen, refrigerated, orotherwise stored prior to and/or after testing and/or fractionation.

In some embodiments, once a genetic biomarker (e.g., a geneticalteration) has been verified against DNA isolated or obtained fromwhite blood cells and the genetic biomarker (e.g., a genetic alteration)is determined not to be present in DNA isolated from white blood cells(e.g., the genetic biomarker (e.g., a genetic alteration) is determinedto originate from a cancer cell in the subject), the subject can undergofurther diagnostic testing or increased monitoring (e.g., using one ormore of the diagnostic testing and/or monitoring methods disclosedherein). In some embodiments, once a genetic biomarker (e.g., a geneticalteration) has been verified against DNA isolated or obtained fromwhite blood cells and the genetic biomarker (e.g., a genetic alteration)is determined not to be present in DNA isolated from white blood cells(e.g., the genetic biomarker (e.g., a genetic alteration) is determinedto originate from a cancer cell in the subject), the subject can beadministered a therapeutic intervention (e.g., one or more of thetherapeutic interventions disclosed herein).

Sample Classification

The present disclosure provides methods of identifying the presence ofcancer in a subject based on one or more protein biomarkers (e.g.,protein concentrations in whole blood or plasma). Various methods can beused to determine whether the subject has cancer and/or the likelihoodthat the subject has cancer. These methods involve various types ofstatistical techniques and methods described herein, including, e.g., aregression model, a logistic regression model, a neural network, aclustering model, principal component analysis, correlated componentanalysis, nearest neighbor classifier analysis, linear discriminantanalysis, quadratic discriminant analysis, a support vector machine, adecision tree, Random Forest, a genetic algorithm, classifieroptimization using bagging, classifier optimization using boosting,classifier optimization using the Random Subspace Method, a projectionpursuit, and genetic programming and weighted voting, etc.

In some embodiments, a regression analysis can be used to determinewhether a subject has cancer. The regression analysis can be performedon a panel of protein biomarkers, a panel of genetic biomarkers (e.g.,mutations), and/or a panel includes both protein biomarkers and geneticbiomarkers. In some embodiments, the regression analysis is performed onthe Ω score for one or more mutations (e.g., top mutation) and/or apanel of protein biomarkers.

In some embodiments, the regression analysis is performed based on amathematic model has the form:

V=α+Σβ _(i) f(X _(i))

In this form of the model, V is a value indicating the likelihood scorethat a subject has cancer. In some embodiments, the likelihood score isindicative of the probability that a test subject has cancer. X_(i)represents the value of each biomarker (e.g., Ω score, proteinconcentrations in plasma etc.). β_(i) is a coefficient for f(X_(i)),which is a variable corresponds to value of the biomarker. The functionƒ(x) is a function that gives a corresponding value of x. In someembodiments, ƒ(x)=x. Thus, the mathematic model can have the formV=α+Σβ_(i) X_(i). In some other embodiments, ƒ(x) may be a function fornormalization or standardization. In some embodiments, the formula mayinclude additional parameters to account for age, sex, and racecategory.

In some embodiments, V is a value indicating the likelihood score for asubject has cancer. In some embodiments, V is an actual probability (anumber varying between 0 and 1). In other embodiments, V is a value fromwhich a probability can be derived.

In some embodiments, the mathematical model is a regression model, forexample, a logistic regression model or a linear regression model. Theregression model can be used to test various sets of biomarkers.

In the case of linear regression models, the model can be used toanalyze expression data from a test subject and to provide a resultindicative of a quantitative measure of the test subject, for example,the likelihood that the subject has cancer.

In general, a linear regression equation is expressed as

Y=α+β ₁ X ₁+β₂ X ₂+ . . . +β_(k) X _(k)+ε

Y, the dependent variable, indicates a quantitative measure of abiological feature (e.g., likelihood of having cancer or not havingcancer). The dependent variable Y depends on k explanatory variables(the measured characteristic values for the biomarkers), plus an errorterm that encompasses various unspecified omitted factors. In theabove-identified model, the parameter β₁ gauges the effect of the firstexplanatory variable X₁ on the dependent variable Y. β₂ gives the effectof the explanatory variable X₂ on Y.

A logistic regression model is a non-linear transformation of the linearregression. The logistic regression model is often referred to as the“logit” model and can be expressed as

ln[p/(1−p)]=α+β₁ X ₁+β₂ X ₂+ . . . +β_(k) X _(k)+ε

-   -   where,    -   α is a constant;    -   ε is an error term;    -   ln is the natural logarithm, log_((e)), where e=2.71828 . . . ,    -   p is the probability that the event Y occurs,    -   p/(1−p) is the “odds,”    -   ln [p/(1−p)] is the log odds, or “logit.”

It will be appreciated by those of skill in the art that a and c can befolded into a single constant, and expressed as a. In some embodiments,a single term a is used, and c is omitted. The “logistic” distributionis an S-shaped distribution function. The logit distribution constrainsthe estimated probabilities (p) to lie between 0 and 1.

In some embodiments, the logistic regression model is expressed as

Y=α+Σβ _(i) X _(i)

Here, Y is a value (e.g., a likelihood score) indicating whether the setof biomarkers for a given subject should classify with the case group(e.g., groups of subjects with cancer), as opposed to the control group(e.g., groups of subjects without cancer). The probability that the setof biomarkers classifies with the case group, as opposed to the controlgroup, thus, the probability that the subject has cancer can be derivedfrom Y. The higher the score, the higher the probability that thesubject has cancer.

Xi is the value of ith biomarker. In some embodiments, it can be theprotein concentrations in plasma, gender, age, or a score derived fromgenetic markers (e.g., Ω score). βi is a logistic regression equationcoefficient for the biomarker, α is a logistic regression equationconstant that can be zero, and βi and α are the result of applyinglogistic regression analysis to the case group and the control group.

In some embodiments, the logistic regression model is fit by maximumlikelihood estimation (MLE). The coefficients (e.g., α, β1, β2, . . . )are determined by maximum likelihood. A likelihood is a conditionalprobability (e.g., P(Y|X), the probability of Y given X). The likelihoodfunction (L) measures the probability of observing the particular set ofdependent variable values (Y1, Y2, . . . , Yn) that occur in the sampledata set. In some embodiments, it is written as the product of theprobability of observing Y1, Y2, . . . , Yn:

L=Prob(Y1,Y2, . . . ,Yn)=Prob(Y1)*Prob(Y2)* . . . Prob(Yn)

The higher the likelihood function, the higher the probability ofobserving the Ys in the sample. MLE involves finding the coefficients(α, β1, β2, . . . ) that make the log of the likelihood function (LL<0)as large as possible or −2 times the log of the likelihood function(−2LL) as small as possible. In MLE, some initial estimates of theparameters α, β1, β2, and so forth are made. Then, the likelihood of thedata given these parameter estimates is computed. The parameterestimates are improved, the likelihood of the data is recalculated. Thisprocess is repeated until the parameter estimates remain substantiallyunchanged (for example, a change of less than 0.01 or 0.001). Examplesof logistic regression and fitting logistic regression models are foundin Hastie, The Elements of Statistical Learning, Springer, N.Y., 2001,pp. 95-100.

Once the logistic regression equation coefficients and the logisticregression equation constant are determined, the model can be readilyapplied to a test subject to obtain Y. In some embodiments, Y can beused to calculate probability (p) by solving the function Y=In(p/(1−p)).

In some embodiments, explanatory variables are normalized orstandardized before fitting into the model. Standardized coefficients(or beta coefficients) are the estimates resulting from a regressionanalysis that have been standardized so that the variances of dependentand explanatory variables are 1. Therefore, standardized coefficientsrepresent how many standard deviations a dependent variable will change,per standard deviation increase in the explanatory variable. Forunivariate regression, the absolute value of the standardizedcoefficient equals the correlation coefficient. Standardization of thecoefficient is usually performed to identify which of the explanatoryvariables have a greater effect on the dependent variable in a multipleregression analysis. In some embodiments, variables are standardized ornormalized before fitting into a logistic regression model. Standardizedlogistic regression coefficients (or standardized beta coefficients) arethe estimates resulting from performing a logistic regression analysison variables that have been standardized. In some embodiments, onlyexplanatory variables are standardized, and in some other embodiments,only dependent variables are standardized. Further, in some embodiments,both explanatory variables and dependent variables are standardized. Insome embodiments, the standardized regression coefficient equals thecorresponding unstandardized coefficient multiplied by the ratiostd(X_(i))/std(Y), where “std” denotes standard deviation.

In some embodiments, the omega score (e.g., the omega score for the topmutation) can be used as an explanatory variable in a logisticregression. In some embodiments, the logistic regression can include oneor more other explanatory variables (e.g., concentrations of proteins).In some embodiments, the protein biomarkers can be selected based onMann-Whitney-Wilcoxon test. In some embodiments, the selected proteinbiomarkers have higher median values in cancer samples than in normalsamples. In some embodiments, a forward selection is used to selectexplanatory variables from all biomarkers (including genetic biomarkersand protein biomarkers).

Applying a mathematical model to the data can generate one or moreclassifiers. The classifiers are mathematical model with appropriateparameters (e.g., β coefficient in regression model). These parameterscan be determined by applying a mathematical model to a training dataset, e.g., a data set that includes both control subjects and a group ofsubjects that have cancer.

A classifier can be evaluated for its ability to properly characterizeeach subject in a dataset (e.g., a training dataset or a validationdataset) using methods known to a person of ordinary skill in the art.Various statistical criteria can be used, for example, area under thecurve (AUC), percentage of correct predictions, sensitivity, and/orspecificity. In some embodiments, the classifier is evaluated by crossvalidation, Leave One OUT Cross Validation (LOOCV), n-fold crossvalidation, and jackknife analysis. In some embodiments, each classifieris evaluated for its ability to properly characterize those subjects ina dataset not used to generate the classifier (a “test dataset”).

In some embodiments, the method used to evaluate the classifier for itsability to properly characterize each subject in a dataset is a methodthat evaluates the classifier's sensitivity (true positive fraction) and1-specificity (true negative fraction). In some embodiments, the methodused to test the classifier is a Receiver Operating Characteristic(ROC), which provides several parameters to evaluate both thesensitivity and the specificity of the result of the equation generated.In some embodiments, the ROC area (area under the curve) is used toevaluate the equations. A ROC area greater than 0.5, 0.6, 0.7, 0.8, 0.9is preferred. A perfect ROC area score of 1.0 is indicative of both 100%sensitivity and 100% specificity. In some embodiments, classifiers areselected on the basis of the evaluation score. In some embodiments, theevaluation scoring system used is a receiver operating characteristic(ROC) curve score determined by the area under the ROC curve. In someembodiments, classifiers with scores of greater than 0.95, 0.9, 0.85,0.8, 0.7, 0.65, 0.6, 0.55, or 0.5 are chosen. In some embodiments, wherespecificity is important to the use of the classifier, a sensitivitythreshold can be set, and classifiers ranked on the basis of thespecificity are chosen. For example, classifiers with a cutoff forspecificity of greater than 0.95, 0.9, 0.85, 0.8, 0.7, 0.65, 0.6, 0.550.5 or 0.45 can be chosen. Similarly, the specificity threshold can beset, and classifiers ranked on the basis of sensitivity (e.g., greaterthan 0.95, 0.9, 0.85, 0.8, 0.7, 0.65, 0.6, 0.55 0.5 or 0.45) can bechosen. Thus, in some embodiments, only the top ten ranking classifiers,the top twenty ranking classifiers, or the top one hundred rankingclassifiers are selected. The ROC curve can be calculated by variousstatistical tools, including, but not limited to, Statistical AnalysisSystem (SAS®), R, and CORExpress® statistical analysis software.

As would be understood by a person of ordinary skill in the art, theutility of the combinations and classifiers determined by a mathematicalmodel will depend upon some characteristics (e.g., race, age group,gender, medical history) of the population used to generate the data forinput into the model. One can select the individually identifiedbiomarkers or subsets of the individually identified genes, and test allpossible combinations of the selected biomarkers to identify usefulcombinations of biomarker sets.

In some embodiment, a subject's likelihood score (e.g., the Y value in alogistic regression) can be used to determine whether a subject islikely to have cancer. Thus, if the likelihood score is greater than apre-determined reference threshold, the subject is likely to havecancer. In some embodiment, if the likelihood s score is less than areference threshold, the subject is not likely to have cancer. A personskilled in the art will appreciate that the appropriate referencethreshold for each classifier can be different, and can be optimized forvarious statistical measures (e.g., sensitivity, specificity, percentageof correct predictions). In some embodiments, the reference threshold isdetermined by experiments or in a clinical trial.

In some embodiments, multiple classifiers are created that aresatisfactory for the given purpose (e.g., all have sufficient AUC and/orsensitivity and/or specificity). In some embodiments, a formula isgenerated that utilizes more than one classifier. For example, a formulacan be generated that utilizes classifiers in series. Other possiblecombinations and weightings of classifiers would be understood and areencompassed herein.

In some embodiments, the probability that a subject has cancer can bederived from the likelihood score. Thus, if the probability is greaterthan a pre-determined threshold, for example, 0.1, 0.2, 0.3, 0.4, 0.5,0.6, 0.7, 0.8, 0.9, the subject will be administered with a treatmentfor cancer. In some embodiments, if the probability is less than apre-determined threshold, for example, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6,0.7, 0.8, 0.9, the subject will not be administered with a treatment forcancer. In some embodiments, the pre-determined threshold for treatmentis 0.4, 0.5, or 0.6, and the pre-determined threshold for not treating,or for discontinuation of treatment, is 0.4, 0.5, or 0.6. In someembodiments, the pre-determined threshold is determined by experimentsor in a clinical trial.

A person skilled in the art will also appreciate that the sensitivityand the specificity of the method depend on the reference threshold (orthe cut-off point). When the reference threshold is raised, thesensitivity will decrease, but the specificity will increase. In someembodiments, the reference threshold can be optimized for thesensitivity, the specificity, or the percentage of correct predictions.

Determining Tissue of Origin

The present disclosure provides methods of determining cancer type, orthe origin of a cancer cell or a tumor cell. Mathematical models can beapplied to various biomarkers as described herein (e.g., proteinconcentrations for various biomarkers, genetic mutations in variousbiomarkers).

Mathematical models useful in accordance with the disclosure includethose using both supervised and unsupervised learning techniques. Insome embodiments, the mathematical model chosen uses supervised learningin conjunction with a training dataset to evaluate each possiblecombination of biomarkers. Various mathematical models can be used, forexample, a regression model, a logistic regression model, a neuralnetwork, a clustering model, principal component analysis, correlatedcomponent analysis, nearest neighbor classifier analysis, lineardiscriminant analysis, quadratic discriminant analysis, a support vectormachine, a decision tree, Random Forest, a genetic algorithm, classifieroptimization using bagging, classifier optimization using boosting,classifier optimization using the Random Subspace Method, a projectionpursuit, and genetic programming and weighted voting, etc.

In some embodiments, a supervised learning model is used to determinethe origin of a cancer cell or a tumor cell. The supervised learningmodel refers to a model that learns a function that maps an input to anoutput based on example input-output pairs. It can infer a function fromlabeled training data consisting of a set of training examples. Insupervised learning, each example is a pair consisting of an inputobject (typically a vector, e.g., a vector of protein biomarkers) and adesired output value (also called the supervisory signal, e.g., cancertype or tissue of origin). A supervised learning algorithm analyzes thetraining data and produces an inferred function, which can be used formapping the biomarkers obtained from a test subject. Many supervisedmachine learning methods can be used. The methods include, but are notlimited to, Support Vector Machines, regression analysis, linearregression, logistic regression, naive Bayes, linear discriminantanalysis, decision trees, k-nearest neighbor algorithm, neural networks(e.g., Multilayer perceptron).

In some embodiments, an unsupervised learning model is used to determinethe origin of a cancer cell or a tumor cell. The unsupervised machinelearning refers to the machine learning task of inferring a functionthat describes the structure of “unlabeled” data (i.e. data that has notbeen classified or categorized). This is performed under the assumptionthat relevant biomarkers will have more similarity if the samples havethe same origin. The unsupervised machine learning can identify theseshared characteristics and apply these models to biomarkers obtainedfrom a test subject, thereby determining the origin of a cancer cell ora tumor cell in the subject. Many unsupervised machine learning methodscan be used. The methods include, but are not limited to, clustering(e.g., k-means clustering, mixture model clustering, and hierarchicalclustering, etc.), anomaly detection, unsupervised neural networks(e.g., autoencoders, deep belief nets, Hebbian learning, generativeadversarial networks, and self-organizing map, etc.).

In some embodiments, Random Forest is used. Random Forest refers to alearning method for classification, regression and other tasks, thatoperate by constructing a multitude of decision trees at training timeand outputting the class that is the mode of the classes(classification) or mean prediction (regression) of the individualtrees. Random Forest can be implemented by various programs, e.g., bythe Random Forest package (Liaw, Andy, and Matthew Wiener.“Classification and regression by randomForest.” R news 2.3 (2002):18-22). In some embodiments, Random Forest identifies the presence ofcancer in a subject based on one or more protein biomarkers (e.g.,protein concentrations in whole blood or plasma). In some embodiments,more than ten rounds of 10-fold cross validation can be performed.

In some embodiments, support vector machines (SVM) can be used todetermine tissue of origin. SVM starts with a set of training examples,each marked as belonging to one or the other of two categories (e.g.,cancer type). The SVM training algorithm builds a model that assigns newexamples to one category or the other, making it a non-probabilisticbinary linear classifier (e.g., cancer from a particular origin, andcancer that is not from the particular origin). An SVM model is arepresentation of the examples as points in space, mapped so that theexamples of the separate categories are divided by a clear gap that isas wide as possible.

These mathematical models can be applied to various biomarkers, orpanels of biomarkers (e.g., protein biomarkers) as described herein todetermine the origin of the cancer cell. In some embodiments, thesemethods are only applied to subjects who have been predicted to havecancer.

In some embodiments, once a subject is determined to have or isdetermined as being likely to have cancer, a tissue of origin for thecancer is determined as shown in FIG. 68. For example, a decision can bemade at each branch of the tree shown in FIG. 68, and once a terminus isreached (e.g. there are no more decisions to be made), the tissue oforigin can be predicted or determined as shown. The tree shown in FIG.68 provides a framework for a practitioner to predict or determine thetissue of origin of a cancer (e.g., breast cancer, colorectal cancer,liver cancer, lung cancer, ovarian cancer, pancreatic cancer, andgastrointestinal cancer).

The tree shown in FIG. 68 is exemplary and non-limiting. For example, atsome decision points, the level of a protein biomarker (e.g. one or moreprotein biomarkers) can be determined (e.g., the level of a proteinbiomarker present in a sample obtained from a subject), and the level ofthat protein biomarker(s) can be compared to the level indicated on thetree shown in FIG. 68. In some embodiments, the level of a proteinbiomarker is determined (e.g., the level of a protein biomarker presentin a sample obtained from a subject), and the level of that proteinbiomarker is compared to the level that differs from the level indicatedon the tree shown in FIG. 68. In some embodiments, the different levelof a protein biomarker (e.g. one or more of the protein biomarkers) onthe tree shown in FIG. 68 is about 10% different from (e.g., about 10%greater than or 10% less than), about 15% different from (e.g., about15% greater than or 15% less than), about 20% different from (e.g.,about 20% greater than or 20% less than), about 25% different from(e.g., about 25% greater than or 25% less than), about 30% differentfrom (e.g., about 30% greater than or 30% less than), about 35%different from (e.g., about 35% greater than or 35% less than), about40% different from (e.g., about 40% greater than or 40% less than),about 45% different from (e.g., about 45% greater than or 45% lessthan), about 50% different from (e.g., about 50% greater than or 50%less than) the level of that protein biomarker shown at one or moredecision points on the tree shown in FIG. 68. At some decision points,the gender (e.g., the biological gender) of the subject is determinedand compared to the gender at one or more decision points indicated onthe tree shown in FIG. 68. At some decision points, the subject can bedetermined to be a woman. At some decision points, the subject can bedetermined to be a man.

In some embodiments, once a subject is determined to have or isdetermined as being likely to have cancer, a tissue of origin for thecancer is determined according to the list of rules shown in FIG. 69 andthe table below. For example, the level of a protein biomarker (e.g. oneor more protein biomarkers) can be determined (e.g., the level of aprotein biomarker present in a sample obtained from a subject), and thelevel of that protein biomarker(s) can be compared to the levelindicated in the rules shown in FIG. 69 and the table below. As onenon-limiting example, the rule [condition=CA125>102.76 & sFas<=830.345 &gender % in % c(‘F’), prediction=ovarian] means that if a woman (% in %c(‘F’)) has CA125>102.76, and sFas<=830.345, then the woman has or ispredicted to have ovarian cancer. A person skilled in the art will beable to detect the level of one or more protein biomarkers in a subjectand/or determine the gender (e.g., biological gender) of a subject, anddetermine that the subject has or is likely to have a cancer type listedon FIG. 69. In some embodiments, once a subject is determined to have oris determined as being likely to have cancer, a tissue of origin for thecancer is determined to be colorectal when the level of one or moreprotein biomarkers in a subject and/or the gender (e.g., biologicalgender) of a subject does not follow any of the specific rules shown inFIG. 69 and the table below. In some embodiments, the level of a proteinbiomarker is determined (e.g., the level of a protein biomarker presentin a sample obtained from a subject), and the level of that proteinbiomarker is compared to the level that differs from the level indicatedin the rules shown in FIG. 69 and the table below. In some embodiments,the different level of a protein biomarker (e.g. one or more of theprotein biomarkers) in the rules shown in FIG. 69 is about 10% differentfrom (e.g., about 10% greater than or 10% less than), about 15%different from (e.g., about 15% greater than or 15% less than), about20% different from (e.g., about 20% greater than or 20% less than),about 25% different from (e.g., about 25% greater than or 25% lessthan), about 30% different from (e.g., about 30% greater than or 30%less than), about 35% different from (e.g., about 35% greater than or35% less than), about 40% different from (e.g., about 40% greater thanor 40% less than), about 45% different from (e.g., about 45% greaterthan or 45% less than), about 50% different from (e.g., about 50%greater than or 50% less than) the level of that protein biomarker shownat one or more decision points in the rules shown in FIG. 69 and thetable below. In some embodiments, the gender (e.g., the biologicalgender) of the subject is determined and compared to the gender in oneor more of the rules shown in FIG. 69 and the table below. In someembodiments, the subject can be determined to be a woman (e.g., “gender% in % c(‘F’)”). In some embodiments, the subject can be determined tobe a man (e.g., “gender % in % c(‘M’)”). For the table below, exemplaryand non-limiting rules and cancer type predictions are indicated. “Else”indicates that if none of the other exemplary rules are met, the cancercan be predicted to be colorectal.

Exemplary Rules Prediction CA125 > 102.76 & sFas <= 830.345 & gender%in% c(‘F’) Ovarian CA199 > 37.65 & CYFRA211 <= 14640.67 & CD44 > 15.67& Pancreatic Midkine > 289.485 & PAR > 4580.45 & sHER2 > 6935.375 AFP >17774.49 & TIMP2 <= 61777.34 & Galectin3 <= 16.72 & Liver Mesothelin <=27.71 CA199 <= 71.53 & Leptin > 12927.9 & sFas > 411.525 & Breast TIMP1<= 59700.835 & TIMP2 <= 37667.97 & gender %in% c(‘F’) CA125 <= 104.41 &CA153 <= 16.22 & CA199 <= 117.275 & HGF <= 465.81 CRC & Leptin <=7244.265 & SHBG > 29.53 Prolactin > 37214.25 & sFas > 1046.435 & TIMP1<= 93517.645 & Lung DKK1 <= 1.095 & sHER2 <= 6054.825 & gender %in%c(‘M’) CA153 <= 15.285 & IL8 > 39.235 & IL8 <= 163.64 & sFas <= 1098.015& Stomach/Esophageal Myeloperoxidase > 19.325 & sHER2 <= 5172.43 AFP <=2867.07 & CA153 > 9.505 & Prolactin > 37962.925 & sFas > 688.47 Lung &Myeloperoxidase <= 11.935 & Thrombospondin2 <= 3273.685 AFP <= 36492.58& CA125 <= 10.475 & Prolactin <= 275955.405 & CRC sFas <= 1351.39 &AXL > 1999.77 & sHER2 <= 10172.47 CEA <= 1892.955 & sFas <= 1076.05 &TIMP2 > 40306.87 & Ovarian CD44 <= 26.915 & sHER2 <= 9959.43 & gender%in% c(‘F’) Leptin <= 5186.325 & OPN > 94670.57 & TIMP1 > 75617.725 &Stomach/Esophageal SHBG <= 141.115 & sHER2 <= 8369.265 & Thrombospondin2<= 17040.89 AFP <= 40375.115 & CA153 <= 20.835 & CEA > 3057.27 & CRCLeptin <= 237948.935 & OPN <= 250628.405 & TIMP2 <= 68136.905 AFP <=299906.424 & sFas > 1100.55 & TIMP1 <= 85064.65 & Breast TIMP2 <=53195.905 & DKK1 <= 1.285 & gender %in% c(‘F’) Leptin > 8839.01 & sFas<= 1745.34 & TIMP1 > 63205.505 & CRC CD44 <= 19.735 & Mesothelin <=39.705 & gender %in% c(‘F’) AFP <= 5369.16 & CA153 <= 16.21 & CEA >1374.755 & CRC Myeloperoxidase <= 368.56 & Midkine <= 4401.495 & sHER2<= 10713.16 TIMP2 <= 61631.2 & Myeloperoxidase > 23.725 & SHBG <= 96.985& Stomach/Esophageal DKK1 <= 1.335 & Midkine <= 348.515 & sHER2 <=5523.84 CA125 <= 56.21 & HGF <= 928.54 & Prolactin > 65977.55 & LungsFas <= 2849.195 & Mesothelin > 13.68 & AXL <= 2093.75 CA199 <= 36.795 &CEA <= 115717.2 & HE4 <= 15657.71 & Breast Leptin > 15287.95 & sFas >716.205 & gender %in% c(‘F’) Else CRC

For FIGS. 68 and 69 and the table shown above, the units of certainprotein biomarkers are indicated in the key below:

CA19-9 U/ml CEA pg/ml CA125 U/ml AFP pg/ml Prolactin pg/ml HGF pg/ml OPNpg/ml TIMP-1 pg/ml TIMP-2 pg/ml Mesothelin ng/ml Midkine pg/mlKallikrein-6 pg/ml CD44 ng/ml Angiopoietin-2 pg/ml Endoglin pg/mlFollistatin pg/ml G-CSF pg/ml GDF15 ng/ml DKK1 ng/ml NSE ng/ml OPG ng/mlAXL pg/ml sHER2/sEGFR2/sErbB2 pg/ml Thrombospondin-2 pg/ml sEGFR pg/mlPAR pg/ml sPECAM-1 pg/ml CA15-3 U/ml Leptin pg/ml IL-6 pg/ml IL-8 pg/mlsFas pg/ml FGF2 pg/ml CYFRA 21-1 pg/ml HE4 pg/ml TGFa pg/ml Galectin-3ng/ml Myeloperoxidase ng/ml SHBG nM

Detecting Genetic Biomarkers

Any of a variety of techniques can be used to detect the presence of oneor more genetic biomarkers (e.g., mutations) present in a sample (e.g.,a cervical, endometrial, urine, saliva, ctDNA, blood, serum, and/orplasma sample) obtained from a subject. Non-limiting examples of suchtechniques include a PCR-based multiplex assay, a digital PCR assay, adroplet digital PCR (ddPCR) assay, a PCR-based singleplex PCR assay, aSanger sequencing assay, a next-generation sequencing assay, aquantitative PCR assay, a ligation assay, and a microarray assay. Thoseof ordinary skill in the art will be aware of other suitable techniquesfor detecting the presence of one or more genetic biomarkers (e.g.,mutations) present in a sample obtained from a subject.

In some embodiments of methods provided herein, the presence of one ormore genetic biomarkers (e.g., mutations) present in a sample (e.g., acervical, endometrial, urine, saliva, ctDNA, blood, serum, and/or plasmasample) obtained from a subject is detected using a sequencing method(e.g., a PCR-based sequencing method). Any appropriate number ofnucleotides can be sequenced. Nucleotides sequenced in the methodsprovided herein can be contiguous or non-contingous. In someembodiments, no more than 20,000 (e.g., about 2000, about 2500, about3000, about 3500, about 4000, about 5000, about 6000, about 7000, about8000, about 9000, about 10,000, about 15,000, or about 20,000)nucleotides are sequenced. In some embodiments, at least 200 (e.g.,about 300, about 400, about 500, about 600, about 700, about 800, about900, about 1000, about 1100, about 1200, about 1300, about 1400, about1500, about 1600, about 1700, about 1800, about 1900 or about 2000)nucleotides are sequenced. In some embodiments, from about 200 to about20,000 (e.g., from about 200 to about 20,000, from about 300 to about15,000, from about 400 to about 10,000, from about 500 to about 9000,from about 600 to about 8000, from about 700 to about 7000, from about800 to about 6000, from about 900 to about 5000, from about 1000 toabout 4000, from about 1100 to about 3500, from about 1200 to about3000, from about 1300 to about 2500, or from about 1500 to about 2000)nucleotides are sequenced. In some embodiments, 300+/−15%, 400+/−15%,500+/−15%, 600+/−15%, 700+/−15%, 800+/−15%, 900+/−15%, 1000+/−15%,1100+/−15%, 1200+/−15%, 1300+/−15%, 1400+/−15%, 1500+/−15%, 1600+/−15%,1700+/−15%, 1800+/−15%, 1900+/−15%, 2000+/−15%, 2500+/−15%, 3000+/−15%,3500++/−15%, 4000+/−15%, 5000+/−15%, 6000+/−15%, 7000+/−15%, 8000+/−15%,9000+/−15%, 10,000+/−15%, 15,000+/−15%, or 20,000+/−15% nucleotides aresequenced.

In some embodiments of methods provided herein, the presence of one ormore mutations present in a sample obtained from a subject is detectedby sequencing regions of interest. Any appropriate number of regions ofinterest can be sequenced. In some embodiments, no more than 70 (e.g.,about 68, about 65, about 62, about 61, about 60, about 58, about 55,about 52, about 50, about 45, about 40, about 35, or about 30) regionsof interest are sequenced. In some embodiments, at least 30 (e.g., about30, about 35, about 40, about 45, about 48, about 50, about 53, about55, about 58, about 60, about 61, about 65, about 68, about 70) regionsof interest are sequenced. In some embodiments, from about 30 to about70 (e.g., from about 35 to about 70, from about 40 to about 70, fromabout 45 to about 70, from about 50 to about 70, from about 55 to about70, from about 60 to about 70, from about 65 to about 70, from about 30to about 65, from about 30 to about 60, from about 30 to about 55, fromabout 30 to about 50, from about 30 to about 45, from about 30 to about40, from about 30 to about 35, from about 35 to about 65, from about 40to about 60, from about 45 to about 55, from about 40 to about 50, orfrom about 50 to about 60, or from about 55 to about 65) regions ofinterest are sequenced. A region of interest can be any appropriate size(e.g., can include any appropriate number of nucleotides). In someembodiments, a region of interest can include no more than 800 (e.g.,about 50, about 55, about 60, about 65, about 70, about 75, about 80,about 85, about 90, about 95, about 100, about 200, about 300, about400, about 500, about 600, about 700, or about 800) nucleotides. In someembodiments, a region of interest can include at least 6 (e.g., about 6,about 10, about 15, about 20, about 25, about 30, about 35, about 40,about 45, or about 50) nucleotides. In some embodiments, a region ofinterest can include from about 6 to about 800 (e.g., from about 6 pb toabout 800 bp, from about 10 bp to about 700 bp, from about 15 bp toabout 600 bp, from about 20 bp to about 600 bp, from about 25 bp toabout 500 bp, from about 30 bp to about 400 bp, a from about 35 bp toabout 300 bp, from about 40 bp to about 200 bp, from about 45 bp toabout 100 bp, from about 50 bp to about 95 bp, from about 55 bp to about90, or from about 14 to about 42) nucleotides. In some embodiments, thenumber of regions of interest sequenced can be no more than about 300%(e.g., about 200%, about 150%, or about 125%) of the lowest number ofregions of interest that can be used in methods provided herein andachieve a plateau for sensitivity (see, e.g., Example 1). Any number ofnucleotides within a region of interest can be sequenced. In someembodiments, no more than 300 (e.g., no more than about 20, about 21,about 22, about 23, about 24, about 25, about 26, about 27, about 28,about 29, about 30, about 31, about 32, about 33, about 34, about 35,about 40, about 45, about 50, about 55, about 60, about 100, about 200,or about 300) nucleotides within a region of interest can be sequenced.In some embodiments, at least 6 (e.g., at least about 6, about 7, about8, about 9, about 10, about 11, about 12, about 13, about 14, about 15,about 16, about 17, about 18, about 19, or about 20) nucleotides withina region of interest can be sequenced. In some embodiments, from about 6to about 300 (e.g., from about 6 to about 300, from about 7 to about200, from about 8 to about 100, from about 9 to about 60, from about 10to about 55, from about 11 to about 50, from about 12 to about 45, fromabout 13 to about 40, from about 14 to about 35, from about 15 to about34, from about 14 to about 33, from about 15 to about 32, from about 16to about 31, from about 17 to about 30, from about 18 to about 29, fromabout 19 to about 28, from about 20 to about 27) nucleotides within aregion of interest can be sequenced. In some embodiments, about 24,about 28, about 31, about 33, about 37, about 42, or about 51nucleotides within a region of interest can be sequenced.

In some embodiments of methods provided herein, the presence of one ormore mutations present in a sample obtained from a subject is detectedusing a PCR-based sequencing method. For example, the presence of one ormore mutations present in a region of interest can be detected byamplifying DNA in regions of interest such that each ampliconcorresponds to a region of interest (e.g., a region of interestincluding one or more genetic biomarkers). An amplicon can be anyappropriate size (e.g., can include any appropriate number ofnucleotides). In some embodiments, an amplicon can include no more than1000 (e.g., about 50, about 55, about 60, about 65, about 70, about 75,about 80, about 85, about 90, about 95, about 100, about 200, about 300,about 400, about 500, about 600, about 700, about 800, or about 900)nucleotides. In some embodiments, an amplicon can include at least 6(e.g., about 6, about 10, about 15, about 20, about 25, about 30, about35, about 40, about 45, or about 50) nucleotides. In some embodiments,an amplicon can include from about 15 bp to about 1000 bp (e.g., fromabout 6 pb to about 800 bp, from about 10 bp to about 700 bp, from about15 bp to about 600 bp, from about 20 bp to about 600 bp, from about 25bp to about 500 bp, from about 30 bp to about 400 bp, a from about 35 bpto about 300 bp, from about 40 bp to about 200 bp, from about 45 bp toabout 100 bp, from about 50 bp to about 95 bp, from about 55 bp to about90, or from about 66 to about 80, from about 25 bp to about 1000 bp,from about 35 bp to about 1000 bp, from about 50 bp to about 1000 bp,from about 100 bp to about 1000 bp, from about 250 bp to about 1000 bp,from about 500 bp to about 1000 bp, from about 750 bp to about 1000 bp,from about 15 bp to about 750 bp, from about 15 bp to about 500 bp, fromabout 15 bp to about 300 bp, from about 15 bp to about 200 bp, fromabout 15 bp to about 100 bp, from about 15 bp to about 80 bp, from about15 bp to about 75 bp, from about 15 bp to about 50 bp, from about 15 bpto about 40 bp, from about 15 bp to about 30 bp, from about 15 bp toabout 20 bp, from about 20 bp to about 100 bp, from about 25 bp to about50 bp, or from about 30 bp to about 40 bp). For example, ampliconsproduced using multiplex PCR-based sequencing can include about 33nucleotides. Any appropriate number of amplicons can be sequenced. Insome embodiments, no more than 70 (e.g., about 68, about 65, about 62,about 61, about 60, about 58, about 55, about 52, about 50, about 45,about 40, about 35, or about 30) amplicons are sequenced. In someembodiments, at least 30 (e.g., about 30, about 35, about 40, about 45,about 48, about 50, about 53, about 55, about 58, about 60, about 61,about 65, about 68, about 70) amplicons are sequenced. In someembodiments, from about 30 to about 70 (e.g., from about 35 to about 70,from about 40 to about 70, from about 45 to about 70, from about 50 toabout 70, from about 55 to about 70, from about 60 to about 70, fromabout 65 to about 70, from about 30 to about 65, from about 30 to about60, from about 30 to about 55, from about 30 to about 50, from about 30to about 45, from about 30 to about 40, from about 30 to about 35, fromabout 35 to about 65, from about 40 to about 60, from about 45 to about55, from about 40 to about 50, or from about 50 to about 60, or fromabout 55 to about 65) amplicons are sequenced. In some cases, familiesof amplicons are formed in which each member of a family is derived froma single template molecule (e.g., a single region of interest) in thecell-free DNA, and where each member of a family is marked by a commonoligonucleotide barcode, and where each family is marked by a distinctoligonucleotide barcode.

In some embodiments of methods provided herein, the presence of one ormore genetic biomarkers (e.g., mutations) present in a sample (e.g., acervical, endometrial, urine, saliva, ctDNA, blood, serum, and/or plasmasample) obtained from a subject is detected using a method that canincrease the sensitivity of massively parallel sequencing instrumentswith an error reduction technique. For example, such techniques canpermit the detection of rare mutant alleles in a range of 1 mutanttemplate among 100 to 1,000,000 wild-type templates (e.g., 500 to1,000,000 wild-type templates). In some embodiments, such techniques canpermit the detection of rare mutant alleles when they are present as alow fraction of the total number of templates (e.g., when rare mutantalleles when are present at a fraction of less than about 1%, less thanabout 0.1%, less than about 0.01%, less than about 0.001%, less thanabout 0.00001% of total templates, or lower, or any fraction betweenthese exemplary fractions). In some embodiments, the presence of one ormore genetic biomarkers (e.g., mutations) present in a sample obtainedfrom a subject is detected by amplifying DNA (e.g., DNA obtained fromcells in a sample or cell-free DNA) to form families of amplicons inwhich each member of a family is derived from a single template moleculein the cell-free DNA, wherein each member of a family is marked by acommon oligonucleotide barcode, and wherein each family is marked by adistinct oligonucleotide barcode. For example, the presence of one ormore genetic biomarkers (e.g., mutations) present in a sample obtainedfrom a subject can be detected by molecularly assigning a uniqueidentifier (UID) to each template molecule, amplifying each uniquelytagged template molecule to create UID-families, and redundantlysequencing the amplification products. In some embodiments, theoligonucleotide barcode is introduced into the template molecule by astep of amplifying with a population of primers that collectivelycontain a plurality of oligonucleotide barcodes. In some embodiments,the oligonucleotide barcode is endogenous to the template molecule, andan adapter including a DNA synthesis priming site is ligated to an endof the template molecule adjacent to the oligonucleotide barcode. See,e.g., Kinde I, Wu J, Papadopoulos N, Kinzler K W, Vogelstein B (2011)Detection and quantification of rare mutations with massively parallelsequencing. Proc Natl Acad Sci USA 108:9530-9535, the contents of whichare incorporated herein by reference in their entirety.

In some embodiments of methods provided herein, the presence of one ormore genetic biomarkers (e.g., mutations) present in a sample (e.g., acervical, endometrial, urine, saliva, ctDNA, blood, serum, and/or plasmasample) obtained from a subject is detected using a method that canincrease the accuracy of a sequencing reaction. For example, thesequencing depth of a sequencing reaction can increase the accuracy of asequencing reaction. In some embodiments, each region of interest can besequenced at a sequencing depth of no more than 500× (e.g., about 5×,about 10×, about 25×, about 50×, about 100×, about 150×, about 200×,about 300×, about 400×, or about 500×). In some embodiments, each regionof interest can be sequenced at a sequencing depth of from about 5× toabout 500× (e.g., from about 5× to about 400×, from about 5× to about300×, from about 5× to about 200×, from about 5× to about 100×, fromabout 5× to about 50×, from about 10× to about 500×, from about 25× toabout 500×, from about 50× to about 500×, from about 100× to about 500×,from about 200× to about 500×, from about 300× to about 500×, from about400× to about 500×, from about 10× to about 400×, from about 25× toabout 300×, or from about 50× to about 200×). In some embodiments, eachregion of interest can be sequenced at a sequencing depth of at least50,000 (e.g., about 50,000, about 75,000, about 100,000, about 125,000,or about 150,000) reads per base. In some embodiments, each region ofinterest can be sequenced at a sequencing depth of no more than 150,000(e.g., about 50,000, about 75,000, about 100,000, about 125,000, orabout 150,000) reads per base. In some embodiments, each region ofinterest can be sequenced at a sequencing depth of from about 50,000 toabout 150,000 (e.g., from about 50,000 to about 125,000, from about50,000 to about 100,000, from about 50,000 to about 75,000, from about75,000 to about 150,000, or from about 100,000 to about 150,000) readsper base. In some embodiments, the sequencing reaction sequencingreaction can be performed at a depth sufficient to detect a mutation(e.g., in a region of interest) at a frequency as low as 0.0005%.

In some embodiments of methods provided herein, the presence of one ormore genetic biomarkers (e.g., mutations) present in a sample (e.g., acervical, endometrial, or plasma sample) obtained from a subject isdetected using sequencing technology (e.g., a next-generation sequencingtechnology). A variety of sequencing technologies are known in the art.For example, a variety of technologies for detection andcharacterization of circulating tumor DNA in cell-free DNA is describedin Haber and Velculescu, Blood-Based Analyses of Cancer: CirculatingTumor Cells and Circulating Tumor DNA, Cancer Discov., June;4(6):650-61. doi: 10.1158/2159-8290.CD-13-1014, 2014, incorporatedherein by reference in its entirety. Non-limiting examples of suchtechniques include SafeSeqs (Kinde et. al, Detection and quantificationof rare mutations with massively parallel sequencing, Proc Natl Acad SciUSA; 108, 9530-5, 2011), OnTarget (Forshew et al., Noninvasiveidentification and monitoring of cancer mutations by targeted deepsequencing of plasma DNA, Sci Transl Med; 4:136ra68, 2012), and TamSeq(Thompson et al., Winnowing DNA for rare sequences: highly specificsequence and methylation based enrichment. PLoS ONE, 7:e31597, 2012),each of which is incorporated herein by reference in its entirety. Insome embodiments, the presence of one or more mutations present in asample obtained from a subject is detected using droplet digital PCR(ddPCR), a method that is known to be highly sensitive for mutationdetection. In some embodiments, the presence of one or more mutationspresent in a sample obtained from a subject is detected using othersequencing technologies, including but not limited to, chain-terminationtechniques, shotgun techniques, sequencing-by-synthesis methods, methodsthat utilize microfluidics, other capture technologies, or any of theother sequencing techniques known in the art that are useful fordetection of small amounts of DNA in a sample (e.g., ctDNA in acell-free DNA sample).

In some embodiments, the presence of one or more genetic biomarkers(e.g., mutations) present in a sample (e.g., a cervical, endometrial,urine, saliva, ctDNA, blood, serum, and/or plasma sample) obtained froma subject is detected using array-based methods. For example, the stepof detecting a genetic alteration (e.g., one or more geneticalterations) in cell-free DNA can be performed using a DNA microarray.In some embodiments, a DNA microarray can detect one more of a pluralityof genetic biomarkers (e.g., cancer cell mutations). In someembodiments, cell-free DNA is amplified prior to detecting the geneticbiomarker (e.g., the genetic alteration). Non-limiting examples ofarray-based methods that can be used in any of the methods describedherein, include: a complementary DNA (cDNA) microarray (Kumar et al.(2012) J. Pharm. Bioallied Sci. 4(1): 21-26; Laere et al. (2009) MethodsMol. Biol. 512: 71-98; Mackay et al. (2003) Oncogene 22: 2680-2688;Alizadeh et al. (1996) Nat. Genet. 14: 457-460), an oligonucleotidemicroarray (Kim et al. (2006) Carcinogenesis 27(3): 392-404; Lodes etal. (2009) PLoS One 4(7): e6229), a bacterial artificial chromosome(BAC) clone chip (Chung et al. (2004) Genome Res. 14(1): 188-196; Thomaset al. (2005) Genome Res. 15(12): 1831-1837), a single-nucleotidepolymorphism (SNP) microarray (Mao et al. (2007) Curr. Genomics 8(4):219-228; Jasmine et al. (2012) PLoS One 7(2): e31968), amicroarray-based comparative genomic hybridization array (array-CGH)(Beers and Nederlof (2006) Breast Cancer Res. 8(3): 210; Pinkel et al.(2005) Nat. Genetics 37: S11-S17; Michels et al. (2007) Genet. Med. 9:574-584), a molecular inversion probe (MIP) assay (Wang et al. (2012)Cancer Genet 205(7-8): 341-55; Lin et al. (2010) BMC Genomics 11: 712).In some embodiments, the cDNA microarray is an Affymetrix microarray(Irizarry (2003) Nucleic Acids Res 31:e15; Dalma-Weiszhausz et al.(2006) Methods Enzymol. 410: 3-28), a NimbleGen microarray (Wei et al.(2008) Nucleic Acids Res 36(9): 2926-2938; Albert et al. (2007) Nat.Methods 4: 903-905), an Agilent microarray (Hughes et al. (2001) Nat.Biotechnol. 19(4): 342-347), or a BeadArray array (Liu et al. (2017)Biosens Bioelectron 92: 596-601). In some embodiments, theoligonucleotide microarray is a DNA tiling array (Mockler and Ecker(2005) Genomics 85(1): 1-15; Bertone et al. (2006) Genome Res 16(2):271-281). Other suitable array-based methods are known in the art.

In some embodiments, once a subject has been determined to have a cancer(e.g., an ovarian cancer, an endometrial cancer, a bladder cancer, or anUTUC), the subject may be additionally monitored or selected forincreased monitoring. In some embodiments, methods provided herein canbe used to select a subject for increased monitoring at a time periodprior to the time period when conventional techniques are capable ofdiagnosing the subject with an early-stage cancer. For example, methodsprovided herein for selecting a subject for increased monitoring can beused when a subject has not been diagnosed with cancer by conventionalmethods and/or when a subject is not known to harbor a cancer. In someembodiments, a subject selected for increased monitoring can beadministered a diagnostic test (e.g., any of the diagnostic testsdisclosed herein) at an increased frequency compared to a subject thathas not been selected for increased monitoring. For example, a subjectselected for increased monitoring can be administered a diagnostic testat a frequency of twice daily, daily, bi-weekly, weekly, bi-monthly,monthly, quarterly, semi-annually, annually, or any at frequencytherein. In some embodiments, a subject selected for increasedmonitoring can be administered a one or more additional diagnostic testscompared to a subject that has not been selected for increasedmonitoring. For example, a subject selected for increased monitoring canbe administered two diagnostic tests, whereas a subject that has notbeen selected for increased monitoring is administered only a singlediagnostic test (or no diagnostic tests). In some embodiments, a subjectthat has been selected for increased monitoring can also be selected forfurther diagnostic testing. Once the presence of a cancer cell has beenidentified (e.g., by any of the variety of methods disclosed herein), itmay be beneficial for the subject to undergo both increased monitoring(e.g., to assess the progression of the tumor or cancer in the subjectand/or to assess the development of additional genetic biomarkers (e.g.,cancer cell mutations) and/or aneuploidy), and further diagnostictesting (e.g., to determine the size and/or exact location of the tumorharboring the cancer cell). In some embodiments, a therapeuticintervention is administered to the subject that is selected forincreased monitoring after a genetic biomarker (e.g., a cancer cellmutation) and/or aneuploidy is detected. Any of the therapeuticinterventions disclosed herein or known in the art can be administered.For example, a subject that has been selected for increased monitoringcan be further monitored, and a therapeutic intervention can beadministered if the presence of the cancer cell is maintained throughoutthe increased monitoring period. Additionally or alternatively, asubject that has been selected for increased monitoring can beadministered a therapeutic intervention, and further monitored as thetherapeutic intervention progresses. In some embodiments, after asubject that has been selected for increased monitoring has beenadministered a therapeutic intervention, the increased monitoring willreveal one or more genetic biomarkers (e.g., one or more additionalcancer cell mutations) and/or aneuploidy. In some embodiments, such oneor more genetic biomarkers (e.g., one or more additional cancer cellmutations) and/or aneuploidy will provide cause to administer adifferent therapeutic intervention (e.g., a resistance mutation mayarise in a cancer cell during the therapeutic intervention, which cancercell harboring the resistance mutation is resistance to the originaltherapeutic intervention).

In some embodiments, once a subject has been determined to have a cancer(e.g., an ovarian cancer, an endometrial cancer, a bladder cancer, or anUTUC), the subject may be administered further tests or selected forfurther diagnostic testing. In some embodiments, methods provided hereincan be used to select a subject for further diagnostic testing at a timeperiod prior to the time period when conventional techniques are capableof diagnosing the subject with an early-stage cancer. For example,methods provided herein for selecting a subject for further diagnostictesting can be used when a subject has not been diagnosed with cancer byconventional methods and/or when a subject is not known to harbor acancer. In some embodiments, a subject selected for further diagnostictesting can be administered a diagnostic test (e.g., any of thediagnostic tests disclosed herein) at an increased frequency compared toa subject that has not been selected for further diagnostic testing. Forexample, a subject selected for further diagnostic testing can beadministered a diagnostic test at a frequency of twice daily, daily,bi-weekly, weekly, bi-monthly, monthly, quarterly, semi-annually,annually, or any at frequency therein. In some embodiments, a subjectselected for further diagnostic testing can be administered a one ormore additional diagnostic tests compared to a subject that has not beenselected for further diagnostic testing. For example, a subject selectedfor further diagnostic testing can be administered two diagnostic tests,whereas a subject that has not been selected for further diagnostictesting is administered only a single diagnostic test (or no diagnostictests). In some embodiments, the diagnostic testing method can determinethe presence of the same type of cancer as the cancer that was originaldetected. Additionally or alternatively, the diagnostic testing methodcan determine the presence of a different type of cancer as the cancerthat was original detected. In some embodiments, the diagnostic testingmethod is a scan. In some embodiments, the scan is a computed tomography(CT), a CT angiography (CTA), a esophagram (a Barium swallom), a Bariumenema, a magnetic resonance imaging (MRI), a PET scan, an ultrasound(e.g., an endobronchial ultrasound, an endoscopic ultrasound), an X-ray,a DEXA scan. In some embodiments, the diagnostic testing method is aphysical examination, such as an anoscopy, a bronchoscopy (e.g., anautofluorescence bronchoscopy, a white-light bronchoscopy, anavigational bronchoscopy), a colonoscopy, a digital breasttomosynthesis, an endoscopic retrograde cholangiopancreatography (ERCP),an ensophagogastroduodenoscopy, a mammography, a Pap smear, a pelvicexam, a positron emission tomography and computed tomography (PET-CT)scan. In some embodiments, a subject that has been selected for furtherdiagnostic testing can also be selected for increased monitoring. Oncethe presence of a cancer cell has been identified (e.g., by any of thevariety of methods disclosed herein), it may be beneficial for thesubject to undergo both increased monitoring (e.g., to assess theprogression of the tumor or cancer in the subject and/or to assess thedevelopment of additional genetic biomarkers (e.g., additional cancercell mutations) and/or aneuploidy), and further diagnostic testing(e.g., to determine the size and/or exact location of the tumorharboring the cancer cell). In some embodiments, a therapeuticintervention is administered to the subject that is selected for furtherdiagnostic testing after a genetic biomarker (e.g., a cancer cellmutation) and/or aneuploidy is detected. Any of the therapeuticinterventions disclosed herein or known in the art can be administered.For example, a subject that has been selected for further diagnostictesting can be administered a further diagnostic test, and a therapeuticintervention can be administered if the presence of the cancer cell isconfirmed. Additionally or alternatively, a subject that has beenselected for further diagnostic testing can be administered atherapeutic intervention, and can be further monitored as thetherapeutic intervention progresses. In some embodiments, after asubject that has been selected for further diagnostic testing has beenadministered a therapeutic intervention, the additional testing willreveal one or more additional genetic biomarkers (e.g., cancer cellmutations) and/or aneuploidy. In some embodiments, such one or moreadditional genetic biomarkers (e.g., cancer cell mutations) and/oraneuploidy will provide cause to administer a different therapeuticintervention (e.g., a resistance mutation may arise in a cancer cellduring the therapeutic intervention, which cancer cell harboring theresistance mutation is resistance to the original therapeuticintervention).

In some embodiments, the presence of one or more genetic biomarkers(e.g., mutations) present in a sample (e.g., a cervical, endometrial,urine, saliva, ctDNA, blood, serum, and/or plasma sample) obtained froma subject is detected using any of the variety of bottleneck sequencingsystem methods described in International Patent Application PublicationNumber WO 2017/132438, the contents of which are incorporated byreference herein in their entirety. Bottleneck Sequencing System(BotSeqS) is a next-generation sequencing method that simultaneouslyquantifies rare somatic point mutations across the mitochondrial andnuclear genomes. BotSeqS combines molecular barcoding with a simpledilution step immediately prior to library amplification. BotSeqS can beused to show age and tissue-dependent accumulations of rare mutationsand demonstrate that somatic mutational burden in normal tissues canvary by several orders of magnitude, depending on biologic andenvironmental factors. BotSeqS has been used to show major differencesbetween the mutational patterns of the mitochondrial and nuclear genomesin normal tissues. BotSeqS has shown that the mutation spectra of normaltissues were different from each other, but similar to those of thecancers that arose in them.

According to some embodiments of BotSeqS, a method is provided forobtaining the sequence of a DNA. Adaptors are ligated to ends of randomfragments of a DNA population to form a library of adaptor-ligatedfragments, such that upon amplification of a fragment in the library ofadaptor-ligated fragments, each end of the fragment has a distinct end.The library of adaptor-ligated fragments is diluted to form diluted,adaptor-ligated fragments. At least a portion of the diluted,adaptor-ligated fragments is amplified to form families from a singlestrand of an adaptor-ligated fragment. Family members are sequenced toobtain nucleotide sequence of a plurality of family members of anadaptor-ligated fragment.

According to some embodiments of BotSeqS, a method is provided forsequencing DNA. Adaptors are ligated to ends of a population offragmented double-stranded DNA molecules to form a library ofadaptor-ligated fragments, such that upon amplification of a fragment inthe library of adaptor-ligated fragments, each end of the fragment has adistinct end. The library of adaptor-ligated fragments is diluted toform diluted, adaptor-ligated fragments. At least a portion of thediluted, adaptor-ligated fragments is amplified to form families from asingle strand of an adaptor-ligated fragment. Family members aresequenced to obtain nucleotide sequence of a plurality of family membersof an adaptor-ligated fragment. Nucleotide sequence of a member of afirst family is aligned to a reference sequence. A difference betweenthe member of the first family and the reference sequence is identified.The difference is identified as a potential rare or potential non-clonalmutation if it is found in a second family from an opposite strand ofthe single strand of the adaptor-ligated fragment.

According to some embodiments of BotSeqS, a method is provided forsequencing DNA. A double-stranded DNA population from a sample israndomly fragmented to form a library of fragments. Adaptors are ligatedto ends of the fragments to form a library of adaptor-ligated fragments,such that upon amplification of a fragment in the library ofadaptor-ligated fragments, each end of the fragment has a distinct end.The library of adaptor-ligated fragments is diluted to form diluted,adaptor-ligated fragments. At least a portion of the diluted,adaptor-ligated fragments is amplified to form families from a singlestrand of an adaptor-ligated fragment. Family members are sequenced toobtain nucleotide sequence of a plurality of family members of anadaptor-ligated fragment. Nucleotide sequence of a member of a firstfamily is aligned to a reference sequence. A difference between themember of the first family and the reference sequence is identified. Thedifference is identified as a potential rare or potential non-clonalmutation if it is found in a second family from an opposite strand ofthe single strand of the adaptor-ligated fragment.

In some embodiments of detecting the presence of one or more geneticbiomarkers (e.g., mutations) present in a sample (e.g., a cervical,endometrial, urine, saliva, ctDNA, blood, serum, and/or plasma sample)obtained from a subject, rare somatic point mutations are quantifiedacross the mitochondrial and nuclear genomes. One or more embodiments ofsuch methods are referred to informally as BotSeqS, which is short forBottleneck Sequencing System. Using molecular barcoding (exogenous orendogenous) and a simple dilution step immediately prior to libraryamplification, the method permits, for example, determining mutationalburden based on age or tissue type of normal tissues. Various BotSeqSmethods described herein can also be used to demonstrate the effect ofmutagens and environmental insults on mutation rate. Various BotSeqSmethods described herein are designed to accurately detect rare pointmutations in any molecularly-barcoded library in a completely unbiasedfashion.

BotSeqS was can be used with any molecular barcoding strategy, such asendogenous position-demarcated barcodes, described in Kinde, I, et al.,Detection and quantification of rare mutations with massively parallelsequencing. Proceedings of the National Academy of Sciences of theUnited States of America 108, 9530-9535 (2011), and exogenously addedmatched barcodes (Kinde, I, et al., Detection and quantification of raremutations with massively parallel sequencing. Proceedings of theNational Academy of Sciences of the United States of America 108,9530-9535 (2011); Jabara et al., Accurate sampling and deep sequencingof the HIV-1 protease gene using a Primer ID. Proceedings of theNational Academy of Sciences of the United States of America 108,20166-20171 (2011); Schmitt, M. W. et al. Detection of ultra-raremutations by next-generation sequencing. Proceedings of the NationalAcademy of Sciences of the United States of America 109, 14508-14513(2012); Hiatt et al., Single molecule molecular inversion probes fortargeted, high-accuracy detection of low-frequency variation. Genomeresearch 23, 843-854 (2013); Kivioja, T. et al. Counting absolutenumbers of molecules using unique molecular identifiers. Nature methods9, 72-74 (2012); Kinde, I. et al. Evaluation of DNA from thePapanicolaou test to detect ovarian and endometrial cancers. Sciencetranslational medicine 5, 167ra164 (2013); Kumar, A. et al. Deepsequencing of multiple regions of glial tumors reveals spatialheterogeneity for mutations in clinically relevant genes. Genome biology15, 530 (2014); Keys, J. R. et al. Primer ID Informs Next-GenerationSequencing Platforms and Reveals Preexisting Drug Resistance Mutationsin the HIV-1 Reverse Transcriptase Coding Domain. AIDS Res HumRetroviruses 31, 658-668 (2015)). In some embodiments, BotSeqS measuresvery rare mutations, genome-wide in a completely unbiased fashion,whereas SafeSeqS measures relatively frequent but not clonal mutations(i.e., “sub-clonal”) at pre-defined targeted loci.

Conceptually, BotSeqS can be envisioned as achieving low coverage ofrandomly sampled genomic loci, whereas Safe-SeqS works throughultra-high coverage of a targeted locus.

Low genomic coverage which can be seen as a feature of various BotSeqSmethods described herein, permits rare mutations to constitute a majorportion of the signal at that genomic position, contributing to thesensitivity of the method. The applications of such methods are varied.They can be used to measure very rare somatic mutations. They can beused to assess somatic mosaicism, cell lineage development, theories onaging, environmental carcinogen exposure, and cancer risk assessment.Many of these applications are demonstrated in the examples herein.

Various filters can be applied to the data that are generated withvarious BotSeqS sequencing methods. One filter that can be applied isfor mtDNA only; Watson AND Crick duplicate families only, excludingtemplates that include high frequency mutations (i.e., homopolymers, >1mutation per template) and excluding templates that map to repeatMasker.Another filter that can be applied is for nuclear DNA only; Watson ANDCrick duplicate families only, excluding templates that include highfrequency mutations (i.e., homopolymers, >1 mutation per template) andexcluding templates that map to repetitive DNA or structural variants.Another filter that can be applied is for mtDNA only, single-basesubstation only, average quality score of greater than or equal to 30,Read 1>=2 Watson duplicates with >=90% mutation fraction only, Read 2>=2Crick duplicates with >=90% mutation fraction only, Exclude all variantscalled in WGS, Exclude all variants in dbSNP142, Exclude calls that mapto repeatMasker, Exclude visual artifacts and high frequency mutations(i.e., homopolymers, cycle 6 and 7, >1 change per template>1 templateper change). Yet another filter that can be applied is Nuclear DNA only,Single-base substitution only, Average quality score>=30, Read 1>=2 PCRduplicates with >=90% mutation fraction only, Read 2>=2 PCR duplicateswith >=90% mutation fraction only, Exclude all variants called in WGS,Exclude all variants in dbSNP130 and dbSNP142, Exclude calls that map torepetitive DNA or structural variants, Exclude visual artifacts and highfrequency mutations (i.e., homopolymers, cycle 6 and 7, >1 change pertemplate).

Various databases were used to align and filter the data, including:dbSNP build 130, Database of Genome Variants, Segmental Duplications,Fragments of Interrupted Repeats, Simple Tandem Repeats, Repeat Masker,dbSNP build 142, updated Database of Genome Variants, updated Databaseof Genome Variants, updated Segmental Duplications, updated Fragments ofInterrupted Repeats, updated Simple Tandem Repeats, updated RepeatMasker. The GRCh37/hg19 genome assembly from the USCS Human genomeBrowser was used.

Fragments of double stranded DNA can be made from longer chain polymers,using any technique known in the art, including but not limited toenzyme digestion, sonication, and shearing. Alternately, some sources ofDNA are already fragmented at suitable sizes. Such sources includewithout limitation saliva, sputum, urine, plasma, and stool. If thesource of DNA is already appropriately sized, then the fragments do notneed be further fragmented. Desirably, the fragmentation process,whether endogenous or by human action, is random. The desirable size offragments may depend on the length of sequencing reads. Fragments may beless than 2 kbp, less than 1500 bp, less than 1 kbp, less than 500 bp,less than 400 bp, less than 200 bp, or less than 100 bp. Fragments maydesirably be greater than twice the read length, for example. Fragmentsmay be at least 50 bp, at least 100 bp, at least 150 bp, at least 200bp, at least 300 bp, at least 400 bp, at least 500 bp, for example.

In some embodiments, fragments are ligated to adaptors. In someembodiments, each end of a fragment has different adaptors. This can bea laborious process, that may involve much screening and processing toobtain fragments with two distinct adaptors on each end. One way toaccomplish this goal is to use Y, U, or hairpin shaped adaptors whichcontain or can be processed to contain sequence non-complementarysequences on the Watson and Crick strands. If there is anon-complementary region in an adaptor, amplification of theadaptor-ligated fragment will generate double stranded fragments withdifferent adaptor orientation on fragments derived from each strand,when amplified.

Dilution of libraries of adaptor-ligated fragments can be done using anylevel of dilution that is appropriate for the source. Less concentratedsamples can undergo less dilution and more concentrated samples canundergo more dilution. Complexity of a sample will also factor into thedesired degree of dilution. Any dilution series may be used as isconvenient, such as two-fold dilutions, five-fold dilutions, ten-folddilutions, etc. In some embodiments, a dilution level is chosen thatwill yield ˜5-10 members of a family per adapter-ligated fragment. Thisis influenced by how many fragments are sequenced. For example, at onespecific dilution, sequencing −20 million clusters will yield 1-4members, but sequencing 75 million clusters yield 5-10 (see FIG. 55).The more molecules that are sequenced, the higher the number of membersthat will be found per family. Upon sequencing family members derivedfrom the diluted, adaptor-ligated fragments, one desirably obtainsnucleotide sequence of 4-100 family members of an adaptor-ligatedfragment.

Dilution may beneficially achieve a relatively low level of coverage ofthe genome. That is, the genome may be sampled rather than exhaustivelyand repetitively sequenced. In some embodiments, the dilution issufficient so that fewer than 10 families from nuclear DNA include 20 ormore overlapping nucleotides in the non-adaptor portion. In someembodiments, the dilution is sufficient so that fewer than 5 familiesfrom nuclear DNA include 20 or more overlapping nucleotides in thenon-adaptor portion. In some embodiments, the dilution is sufficient sothat fewer than 10 families comprise the potential rare or potentialnon-clonal difference detected between a test sequence and a referencesequence. In some embodiments, the dilution is sufficient so that fewerthan 5 families comprise the potential rare or potential non-clonaldifference detected between a test sequence and a reference sequence.

In some embodiments, dilution accomplishes three features. First,dilution can achieve lower coverage of representative loci to one or afew molecules to “uncover” rare mutations. Second, dilution can increasethe chances that both strands of the initial molecules will be sequencedredundantly. Third, dilution can facilitate the random sampling of thegenome with minimal amount of sequencing.

Amplification can be performed by any technique known in the art.Typically, polymerase chain reaction will be used. Other techniques,whether linear or logarithmic may be used.

Typically, primers will be used in the amplification that arecomplementary to adaptor sequences.

Sequencing can be accomplished by any known technique in the art. A nextgeneration sequencing method may be used. The sequences of the fragmentscan be aligned to a reference sequence. They can be grouped intofamilies on the basis of an endogenous or an exogenous barcode. Anendogenous barcode typically comprises the N nucleotides that areadjacent to the adaptor. The value of N can be chosen as is convenientand provides sufficient diversity/complexity. Exogenous barcodes can beadded in a separate ligation step, by amplification primers, or they canbe part of the adaptors. In some embodiments, the barcodes are random.In some embodiments, from 2 to 1000 family members are sequenced. Insome embodiments, fewer than 100 family members are sequenced. In someembodiments, at least 4 family members are sequenced. In someembodiments, 4 to 10 family members are sequenced.

According to various method described herein, one need not separatephysically or analyze separately the nuclear and mitochondrial genomes.This permits one to compare rates in the two genomes in the same cells.

Exogenous barcoding may be used to identify individual fragments,samples, tissues, patients, etc. Although the examples provided hereinemployed endogenous barcoding, this may be supplemented with or replacedby exogenous barcoding. In some embodiments, the complexity of thebarcode population is greater than the complexity of the population offragments to be barcoded such that the barcode represents a particularfragment. Barcodes can be added to a population of fragments using anytechnique known in the art, including without limitation, byamplification or ligation, or as part of adaptor molecules that areadded by ligation. Differences that can be detected between a determinednucleotide sequence and a reference nucleotide sequence include withoutlimitation mutations, such as point mutations, indels (e.g., insertionsor deletions of 1-6 bases), and/or substitutions. If the same mutationis found in two different families, then a higher degree of certainty isattached to it (e.g., it is more likely that it arose in the biologicalsample, rather than in the experimental processing). The two familiescan have identical sequences deriving from the double strandedfragments, but can have a different orientation with respect to theadaptor sequences. To achieve a higher degree of certainty, one canrequire that at least two members of each of two families have thesequence difference. To achieve a higher degree of certainty, one canrequire that 90% or more of the members of a family have the sequencedifference.

As a means of filtering out germline or clonal mutations, libraries offragments that have not been amplified and which are from the samesample can be sequenced. Germline and clonal mutations will be evidentfrom inspection because of their repeated occurrences.

BotSeqS methods are simply-implemented NGS-based approaches that canaccurately measure rare point mutations in an unbiased, genome-widemanner. Using BotSeqS, several important goals were achieved: (i)estimates of rare mutation frequencies across the whole genome weredefined; (ii) rare mutations in both the nuclear and mitochondrialgenomes of the same population of cells were simultaneously evaluated;(iii) rare mutation frequencies among various normal tissues ofindividuals of different age, DNA repair capacity, or exposure historieswere compared; and (iv) the spectra of rare mutations in normal tissueswas identified, allowing their comparison to those of clonal mutationsin cancers.

Data presented herein show that mutations increase with age, a resultthat is broadly consistent with the literature (Kennedy, S. R., Loeb, L.A. & Herr, A. J. Somatic mutations in aging, cancer andneurodegeneration. Mech Ageing Dev 133, 118-126 (2012); Vijg, J. Somaticmutations, genome mosaicism, cancer and aging. Current opinion ingenetics & development 26, 141-149 (2014). The rate of increase ofmutations is not as great in brain as it is in colon or kidney,presumably because the colon and kidney are both self-renewing tissuesthroughout adult life while the brain is not. On the other hand, thefact that the mutation frequency increased at all after childhood wassurprising, given that the major cell types in pre-frontal cortex aregenerally thought to be post-mitotic (Spalding et al., Retrospectivebirth dating of cells in humans. Cell 122, 133-143 (2005)). Withoutbeing bound by theory, there are several potential explanations for thisincrease. A small number of cells that are replicating more activelythan neurons or glia could be responsible for the increase. Such cellscould include microglia or infiltrating lymphocytes or otherinflammatory cells. Alternatively, these mutations could represent theresults of spontaneous DNA damage independent of DNA replication. Arecent single-cell sequencing study of human neurons suggested thatspontaneous damage occurs during transcription (Lodato, M. A. et al.Somatic mutation in single human neurons tracks developmental andtranscriptional history. Science 350, 94-98 (2015)). However, incontrast to single-cell sequencing, BotSeqS measures mutations that arefound on both strands. Thus for the explanation of spontaneous DNAdamage to be plausible, the mutations identified by BotSeqS would haveto have been subject to DNA repair. Consistent with this possibility,DNA repair processes are known to be active in post-mitotic neurons andglia (Madabhushi, R., Pan, L. & Tsai, L. H. DNA damage and its links toneurodegeneration. Neuron 83, 266-282 (2014)).

A third possibility is that these mutations are artifacts of theprocedure we used to detect them. It is fascinating that this formalpossibility is essentially impossible to exclude because the mutationsthat were are likely found in only one cell of the tissue studied, andthe DNA from that cell is no longer available for subsequent evaluation.Additionally, there is no other technique available to observe suchmutations with the sensitivity achieved by various BotSeqS methodsdescribed herein. The sensitivity is currently limited only by theamount of sequencing devoted to the project. It is easy to detectmutations occurring at 6×10⁻⁸ per bp using a small fraction of a HiSeg™2500 flow cell. It is estimated that mutations could be detected at<10⁻⁹ per bp using an entire flow cell. The only other method thatapproaches this sensitivity has been described by Loeb and colleagues(Schmitt, M. W. et al. Detection of ultra-rare mutations bynext-generation sequencing. Proceedings of the National Academy ofSciences of the United States of America 109, 14508-14513 (2012);Kennedy, S. R. et al. Detecting ultralow-frequency mutations by DuplexSequencing. Nature protocols 9, 2586-2606 (2014)), but this isapplicable only to pre-defined regions (−0.001%) of the genome. In theabsence of direct confirmation, one is forced to use correlations andother approaches to support the accuracy of the technology describedherein. These correlations include the following, as detailed in Table51: similar mutation frequencies and spectra identified in different DNAaliquots of the same samples; similar mutation frequencies and spectraidentified in the same tissues of different individuals of similar age;expected increases in mutation frequencies with age; tissue-specificdifferences in age-dependent increases in mutation frequencies; highermutation frequencies in normal tissues deficient in mismatch repair orexposed to environmental mutagens; and mutation spectra in normaltissues consistent with those previously observed in cancers from thesame tissues. Other in silico and experimental approaches used toevaluate the accuracy of BotSeqS are described in the Example 7.

It was also possible to compare mutation frequencies in themitochondrial and nuclear genomes of the same tissues. In normalindividuals in the absence of exposure to mutagens, the mutationfrequency was much higher in the mitochondria than in the nuclear genome(median ratio of 26.2). This is consistent with the relatively poorefficiency of DNA repair in the mitochondria compared to the nucleargenome. Equally important, however, is that the ratio of mitochondrialto nuclear mutation frequencies was vastly lower (median of 1.3) in thenormal kidneys of individuals exposed to either cigarette smoke or AA.This finding is not consistent with the known, less efficient repair ofDNA in mitochondria. Moreover, there was a shift towards the AAmutational signature, A:T to T: A transversions, in the nuclear DNA ofnormal kidneys in individuals exposed to AA, but virtually none in themtDNA. Without being bound by theory, one possibility is that the highermutation prevalence in the mtDNA could be masking the effect ofenvironmental mutagens on the mitochondrial genome compared to itseffect on the nuclear genome. Another possibility is that there areunexpected and pronounced differences in the ways through which thesemutagens cause DNA damage in these two organelles.

Another novel finding of the data presented herein is the finding thatmutation spectra differed among normal tissues, even in the absence ofexposures to known mutagens. Whether such differences reflect varyingexposures to as yet unidentified commonly encountered mutagens, ortissue-specific repair processes, is not known. In some embodiments, therare mutation spectra in normal tissues were found to be similar to theclonal mutations found in cancers. Though varying mutation spectra incancers has often been attributed to cancer-specific processes, the datapresented herein suggest that at least a subset of these mutationsactually reflect tissue-specific processes. This concept is consistentwith the idea that a substantial fraction of the mutations found incancers occur in normal stem cells (Tomasetti, C. & Vogelstein, B.Cancer etiology. Variation in cancer risk among tissues can be explainedby the number of stem cell divisions. Science 347, 78-81 (2015);Tomasetti, C, Vogelstein, B. & Parmigiani, G. Half or more of thesomatic mutations in cancers of self-renewing tissues originate prior totumor initiation. Proceedings of the National Academy of Sciences of theUnited States of America 110, 1999-2004 (2013). Various BotSeqSapproaches described herein, which can easily measure very raremutations in any tissue or cell type of interest, will be applicable toquestions of broad biomedical interest.

In some embodiments, the presence of one or more genetic biomarkers(e.g., mutations) present in a sample (e.g., a cervical, endometrial,urine, saliva, ctDNA, blood, serum, and/or plasma sample) obtained froma subject is detected using any of the variety of methods described inU.S. Pat. No. 9,476,095, the contents of which are incorporated byreference herein in their entirety. The identification of mutations thatare present in a small fraction of DNA templates is advantageous forprogress in several areas of biomedical research. Though massivelyparallel sequencing instruments are in principle well-suited to thistask, the error rates in such instruments are generally too high toallow confident identification of rare variants. Provided herein is anapproach that can substantially increase the sensitivity of massivelyparallel sequencing instruments for this purpose. One example of thisapproach, called “Safe-SeqS” for (Safe-Sequencing System) includes (i)assignment of a unique identifier (UID) to each template molecule; (ii)amplification of each uniquely tagged template molecule to createUID-families; and (iii) redundant sequencing of the amplificationproducts. PCR fragments with the same UID are truly mutant(“super-mutants”) if ≥95% of them contain the identical mutation. Thisapproach is useful for, e.g., determining the fidelity of a polymerase,the accuracy of oligonucleotides synthesized in vitro, and theprevalence of mutations in the nuclear and mitochondrial genomes ofnormal cells.

In some embodiments of methods provided herein, the presence of one ormore genetic biomarkers (e.g., mutations) present in a sample (e.g., acervical, endometrial, urine, saliva, ctDNA, blood, serum, and/or plasmasample) obtained from a subject is detected using a unique identifier(UID) nucleic acid sequence attached to a first end of each of aplurality of analyte nucleic acid fragments to form uniquely identifiedanalyte nucleic acid fragments, the nucleotide sequence of a uniquelyidentified analyte nucleic acid fragment is redundantly determined,wherein determined nucleotide sequences which share a UID form a familyof members, and a nucleotide sequence is identified as accuratelyrepresenting an analyte nucleic acid fragment when at least 1% ofmembers of the family contain the sequence.

In some embodiments of methods provided herein, the presence of one ormore genetic biomarkers (e.g., mutations) present in a sample (e.g., acervical, endometrial, urine, saliva, ctDNA, blood, serum, and/or plasmasample) obtained from a subject is detected using a unique identifiersequence (UID) attached to a first end of each of a plurality of analyteDNA fragments using at least two cycles of amplification with first andsecond primers to form uniquely identified analyte DNA fragments. Insuch embodiments, the UIDs can be in excess of the analyte DNA fragmentsduring amplification, the first primers can include a first segmentcomplementary to a desired amplicon; a second segment containing theUID; and a third segment containing a universal priming site forsubsequent amplification, the second primers can include a universalpriming site for subsequent amplification, each cycle of amplificationcan attach one universal priming site to a strand, the uniquelyidentified analyte DNA fragments can be amplified to form a family ofuniquely identified analyte DNA fragments from each uniquely identifiedanalyte DNA fragment, and nucleotide sequences of a plurality of membersof the family can be determined.

In some embodiments of methods provided herein, the presence of one ormore genetic biomarkers (e.g., mutations) present in a sample (e.g., acervical, endometrial, urine, saliva, ctDNA, blood, serum, and/or plasmasample) obtained from a subject is detected using endogenous uniqueidentifier sequences (UIDs). For example, fragmented analyte DNA can beobtained that includes fragments of 30 to 2000 bases, inclusive. Eachend of a fragment can form an endogenous UID for the fragment. Adapteroligonucleotides can be attached to ends of the fragments to formadapted fragments. Fragments representing one or more selected genes canoptionally be enriched by means of capturing a subset of the fragmentsusing capture oligonucleotides complementary to selected genes in theanalyte DNA or by amplifying fragments complementary to selected genes.The adapted fragments can be amplified using primers complementary tothe adapter oligonucleotides to form families of adapted fragments.Nucleotide sequences can be determined for a plurality of members of afamily. Nucleotide sequences of the plurality of members of the familycan be compared. A nucleotide sequence can be identified as accuratelyrepresenting an analyte DNA fragment when at least a 1% of members ofthe family contain the sequence.

In some embodiments, provided herein are compositions includingpopulations of primer pairs, wherein each pair includes a first andsecond primer for amplifying and identifying a gene or gene portion. Thefirst primer can include a first portion (e.g., of 10-100 nucleotides)complementary to the gene or gene portion and a second portion of (e.g.,of 10 to 100 nucleotides) including a site for hybridization to a thirdprimer. The second primer can include a first portion of (e.g., of10-100 nucleotides) complementary to the gene or gene portion and asecond portion (e.g., of 10 to 100 nucleotides) including a site forhybridization to a fourth primer. In some embodiments, interposedbetween the first portion and the second portion of the second primer isa third portion consisting of 2 to 4000 nucleotides forming a uniqueidentifier (UID). The unique identifiers in the population can have atleast 4 different sequences. The first and second primers can becomplementary to opposite strands of the gene or gene portion. A kit mayinclude the population of primers and the third and fourth primerscomplementary to the second portions of each of the first and secondprimers.

In some embodiments of methods provided herein, the presence of one ormore genetic biomarkers (e.g., mutations) present in a sample (e.g., acervical, endometrial, urine, saliva, ctDNA, blood, serum, and/or plasmasample) obtained from a subject is detected using an approach called“Safe-SeqS” (from Safe-Sequencing System). In one embodiment, Safe-SeqSinvolves two basic steps (FIG. 61): the first step is assignment of aUnique Identifier (UID) to each nucleic acid template molecule to beanalyzed, and the second step is the amplification of each uniquelytagged template, so that many daughter molecules with the identicalsequence are generated (defined as a UID-family). If a mutationpre-existed in the template molecule used for amplification, thatmutation should be present in a certain proportion, or even all, ofdaughter molecules containing that UID (barring any subsequentreplication or sequencing errors). A UID-family in which every familymember (or a certain predetermined proportion) has an identical mutationis called a “super-mutant.” Mutations not occurring in the originaltemplates, such as those occurring during the amplification steps orthrough errors in base-calling, should not give rise to super-mutants(e.g., will not be present at the pre-determined frequency in a UIDfamily.) In some embodiments, amplification is not necessary.

Any of the variety of Safe-SeqS approaches can be employed for anypurpose where a very high level of accuracy and sensitivity is desiredto be obtained from sequence data. As described herein, the approach canbe used to assess the fidelity of a polymerase, the accuracy of in vitrosynthesized nucleic acid synthesis, and the prevalence of mutations innuclear or mitochondrial nucleic acids of normal cells. The approach maybe used to detect and/or quantify mosaics and somatic mutations.

In some embodiments of Safe-SeqS approaches provided herein, fragmentsof nucleic acids may be obtained using a random fragment formingtechnique such as mechanical shearing, sonicating, or subjecting nucleicacids to other physical or chemical stresses. Fragments may not bestrictly random, as some sites may be more susceptible to stresses thanothers. Endonucleases that randomly or specifically fragment may also beused to generate fragments. In some embodiments of any of the variety ofSafe-SeqS approaches, fragments of nucleic acids may be obtained using atechnique that does not result in random fragments. Size of fragmentsmay vary, but desirably will be in ranges between 30 and 5,000basepairs, between 100 and 2,000, between 150 and 1,000, or withinranges with different combinations of these endpoints. Nucleic acids maybe, for example, RNA or DNA. Modified forms of RNA or DNA may also beused.

In some embodiments of Safe-SeqS approaches provided herein, attachmentof an exogenous UID to an analyte nucleic acids fragment may beperformed by any means known in the art, including enzymatic, chemical,or biologic. One means employs a polymerase chain reaction. Anothermeans employs a ligase enzyme. The enzyme may be mammalian or bacterial,for example. Ends of fragments may be repaired prior to joining usingother enzymes such as Klenow Fragment of T4 DNA Polymerase. Otherenzymes which may be used for attaching are other polymerase enzymes. AnUID may be added to one or both ends of the fragments. A UID may becontained within a nucleic acid molecule that contains other regions forother intended functionality. For example, a universal priming site maybe added to permit later amplification. Another additional site may be aregion of complementarity to a particular region or gene in the analytenucleic acids. A UID may be from 2 to 4,000, from 100 to 1000, from 4 to400, bases in length, for example. In some embodiments, a UID is 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30,35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250,300, 350, 400, 450, 500, 600, 700, 800, 900, 1000 nucleotides or more inlength, or can be of any length between these lengths. In embodiments inwhich two or more UIDs are used, the UIDs can be of the same ordifferent lengths.

In some embodiments of Safe-SeqS approaches provided herein, UIDs may bemade using random addition of nucleotides to form a short sequence to beused as an identifier. At each position of addition, a selection fromone of four deoxyribonucleotides may be used. Alternatively, a selectionfrom one of three, two, or one deoxyribonucleotides may be used. Thus,the UID may be fully random, somewhat random, or non-random in certainpositions. Another manner of making UIDs utilizes pre-determinednucleotides assembled on a chip. In this manner of making, complexity isattained in a planned manner. In some embodiments, it may beadvantageous to attach a UID to each end of a fragment, increasing thecomplexity of the UID population on fragments.

A cycle of polymerase chain reaction for adding exogenous UID refers tothe thermal denaturation of a double stranded molecule, thehybridization of a first primer to a resulting single strand, theextension of the primer to form a new second strand hybridized to theoriginal single strand. A second cycle refers to the denaturation of thenew second strand from the original single strand, the hybridization ofa second primer to the new second strand, and the extension of thesecond primer to form a new third strand, hybridized to the new secondstrand. Multiple cycles may be employed to increase efficiency, forexample, when the analyte is dilute or inhibitors are present.

In the case of endogenous UIDs, adapters can be added to the ends offragments by any of a variety of methods including, without limitation,ligation. In some embodiments, complexity of the analyte fragments canbe decreased by a capture step, either on a solid phase or in liquidstep. In some embodiments, the capture step employs hybridization toprobes representing a gene or set of genes of interest. In someembodiments, if on a solid phase, non-binding fragments are separatedfrom binding fragments. Suitable solid phases known in the art includefilters, membranes, beads, columns, etc. In some embodiments, if in aliquid phase, a capture reagent can be added which binds to the probes,for example through a biotin-avidin type interaction. After capture,desired fragments can be eluted for further processing. The order ofadding adapters and capturing is not critical. Another non-limitingmeans of reducing the complexity of the analyte fragments involvesamplification of one or more specific genes or regions. One exemplaryway to accomplish this is to use inverse PCR. Primers can be used whichare gene-specific, thus enriching while forming libraries. Optionally,the gene-specific primers can contain grafting sequences for subsequentattachment to a massively parallel sequencing platform.

Because In some embodiments, endogenous UIDs provide a limited number ofunique possibilities, depending on the fragment size and sequencing readlength, combinations of both endogenous and exogenous UIDs can be used.In some embodiments, introducing additional sequences when amplifyingincrease the available UIDs and thereby increase sensitivity. Forexample, before amplification, the template can be split into 96 wells,and 96 different primers can be used during the amplification; thiswould effectively increase the available UIDs 96-fold, because up to 96templates with the same endogenous UID could be distinguished. Thistechnique can also be used with exogenous UIDs, such that each well'sprimers adds a unique, well-specific sequence to the amplificationproducts, which can also improve the specificity of detection of raretemplates.

In some embodiments of Safe-SeqS approaches provided herein,amplification of fragments containing a UID can be performed accordingto known techniques to generate families of fragments. Polymerase chainreaction can be used. Other amplification methods can also be used, asis convenient. Inverse PCR may be used, as can rolling circleamplification. Amplification of fragments typically is done usingprimers that are complementary to priming sites that are attached to thefragments at the same time as the UIDs. In some embodiments, primingsites are distal to the UIDs, so that amplification includes the UIDs.In some embodiments, amplification forms a family of fragments, eachmember of the family sharing the same UID. Because the diversity of UIDsis typically greatly in excess of the diversity of the fragments, eachfamily should derive from a single fragment molecule in the analyte.Primers used for the amplification may be chemically modified to renderthem more resistant to exonucleases. Non-limiting examples of suchmodifications include the use of phosphorothioate linkages between oneor more 3′ nucleotides, and boranophosphates.

In some embodiments of Safe-SeqS approaches provided herein, familymembers are sequenced and compared to identify any divergencies within afamily. In some embodiments, sequencing is performed on a massivelyparallel sequencing platform, many of which are commercially available.If the sequencing platform employs a sequence for “grafting,” i.e.,attachment to the sequencing device, such a sequence can be added duringaddition of UIDs or adapters or separately. A grafting sequence may bepart of a UID primer, a universal primer, a gene target-specific primer,the amplification primers used for making a family, or separate.Redundant sequencing refers to the sequencing of a plurality of membersof a single family.

A threshold can be set for identifying a genetic biomarker (e.g., amutation) in an analyte. If the “mutation” appears in all members of afamily, then it derives from the analyte. If it appears in less than allmembers, then it may have been introduced during the analysis.Thresholds for calling a mutation may be set, for example, at 1%, 5%,10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 97%, 98%, or 100%.Thresholds can be set based on the number of members of a family thatare sequenced and the particular purpose and situation.

In some embodiments of Safe-SeqS approaches provided herein, populationsof primer pairs are used to attach exogenous UIDs. For example, thefirst primer can include a first portion (e.g., of 10-100 nucleotides)complementary to the gene or gene portion and a second portion (e.g., of10 to 100 nucleotides) including a site for hybridization to a thirdprimer. In some embodiments the first portion and/or the second portionof the first primer is 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65,70, 75, 80, 85, 90, 95, or 100 nucleotides in length, or any length inbetween. The second primer can include a first portion (e.g., of 10-100nucleotides) complementary to the gene or gene portion and a secondportion (e.g., of 10 to 100 nucleotides) including a site forhybridization to a fourth primer. In some embodiments the first portionand/or the second portion of the second primer is 10, 15, 20, 25, 30,35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100 nucleotidesin length, or any length in between. In some embodiments, interposedbetween the first portion and the second portion of the second primer isa third portion (e.g., of 2 to 4,000 nucleotides, e.g., 2, 3, 4, 5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45,50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350,400, 450, 500, 600, 700, 800, 900, 1000 nucleotides, or any number ofnucleotides between these values) forming a unique identifier (UID). Insome embodiments, interposed between the first portion and the secondportion of both the first and second primer is a third portion (e.g., of2 to 4,000 nucleotides, e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75,80, 85, 90, 95, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700,800, 900, 1000 nucleotides, or any number of nucleotides between thesevalues), each of which forms a unique identifier (UID). In someembodiments, the third portion of the first primer is the same length asthe third portion of the second primer. In some embodiments, the thirdportion of the first primer is a different length than the third portionof the second primer. In some embodiments, the unique identifiers in thepopulation have at least 4, at least 16, at least 64, at least 256, atleast 1,024, at least 4,096, at least 16,384, at least 65,536, at least262,144, at least 1,048,576, at least 4,194,304 at least 16,777,216, orat least 67,108,864 different sequences. In some embodiments, the firstand second primers are complementary to opposite strands of the gene orgene portion. A kit can be made containing both the primers forattaching exogenous UIDs as well as amplification primers, i.e., thethird and fourth primers complementary to the second portions of each ofthe first and second primers. The third and fourth primers canoptionally contain additional grafting or indexing sequences. The UIDmay include randomly selected sequences, pre-defined nucleotidesequences, or both randomly selected sequences and pre-definednucleotides. If both, these can be joined together in blocks orinterspersed.

In some embodiments of Safe-SeqS approaches provided herein, the methodsof analysis can be used to quantitate as well as to determine asequence. For example, the relative abundance of two analyte DNAfragments may be compared using methods described herein.

The results described herein demonstrate that the Safe-SeqS approach cansubstantially improve the accuracy of massively parallel sequencing(Tables 52 and 53). Safe-SeqS can be implemented through eitherendogenous or exogenously introduced UIDs (or both), and can be appliedto virtually any sample preparation workflow or sequencing platform. Asdemonstrated herein, Safe-SeqS can easily be used to identify raremutants in a population of DNA templates, to measure polymerase errorrates, and to judge the reliability of oligonucleotide syntheses. One ofthe advantages of the strategy is that it yields the number of templatesanalyzed as well as the fraction of templates containing variant bases.Previously described in vitro methods for the detection of small numbersof template molecules (e.g., Dressman D, Yan H, Traverso G, Kinzler K W,& Vogelstein B (2003) Transforming single DNA molecules into fluorescentmagnetic particles for detection and enumeration of genetic variations.Proc Natl Acad Sci USA 100:8817-8822; Li J, et al. (2008) Replacing PCRwith COLD-PCR enriches variant DNA sequences and redefines thesensitivity of genetic testing. Nat Med 14:579-584) allow the fractionof mutant templates to be determined but cannot determine the number ofmutant and normal templates in the original sample.

It is of interest to compare Safe-SeqS to other approaches for reducingerrors in next generation sequencing. Sophisticated algorithms toincrease the accuracy of base-calling have been developed (e.g., (ErlichY, Mitra P, delaBastide M, McCombie W R, & Hannon G J (2008)Alta-Cyclic: a self-optimizing base caller for next-generationsequencing. Nat Methods 5:679-682; Rougemont J, et al. (2008)Probabilistic base calling of Solexa sequencing data. BMC Bioinformatics9:431; Druley T E, et al. (2009) Quantification of rare allelic variantsfrom pooled genomic DNA, Nat Methods 6:263-265; Vallania F L, et al.(2010) High-throughput discovery of rare insertions and deletions inlarge cohorts. Genome Res 20:1711-1718)). These can certainly reducefalse positive calls, but their sensitivity is still limited byartifactual mutations occurring during the PCR steps required forlibrary preparation as well as by (a reduced number of) base-callingerrors. For example, the algorithm employed in the current study usedvery stringent criteria for base-calling and was applied to shortread-lengths, but was still unable to reduce the error rate to less thanan average of 2.0×10⁻⁴ errors/bp. This error frequency is at least aslow as those reported with other algorithms. To improve sensitivityfurther, these base-calling improvements can be used together withSafe-SeqS. Travers et al. have described another powerful strategy forreducing errors (Eid J, et al. (2009) Real-time DNA sequencing fromsingle polymerase molecules. Science 323:133-138). With this technology,both strands of each template molecule are sequenced redundantly after anumber of preparative enzymatic steps. However, this approach can onlybe performed on a specific instrument. Moreover, for many clinicalapplications, there are relatively few template molecules in the initialsample and evaluation of nearly all of them is required to obtain therequisite sensitivity. In some embodiments, approaches described hereinthat employ exogenously introduced UIDs (FIG. 63) address this concernby coupling the UID assignment step with a subsequent amplification inwhich few molecules are lost.

Strong evidence supporting the fact that mutations identified byconventional analyses in the current study represent artifacts ratherthan true mutations in the original templates is provided by theobservation that the mutation prevalence in all but one experiment wassimilar-2.0×10⁻⁴ to 2.4×10⁻⁴ mutations/bp (Tables 52 and 53). Theexception was the experiment with oligonucleotides synthesized fromphosphoramidites, in which the error of the synthetic process wasapparently higher than the error rate of conventional Illumina analysiswhen used with stringent base-calling criteria. In contrast, themutation prevalence of Safe-SeqS varied much more, from 0.0 to 1.4×10⁻⁵mutations/bp, depending on the template and experiment. Moreover, themutation prevalence measured by Safe-SeqS in the most controlledexperiment, in which polymerase fidelity was measured (Table 53A), wasalmost identical to that predicted from previous experiments in whichpolymerase fidelity was measured by biological assays. Measurements ofmutation prevalence in the DNA from normal cells provided herein areconsistent with some previous experimental data. However, estimates ofthese prevalences vary widely and may depend on cell type and sequenceanalyzed (see SI text). It therefore cannot be said with certainty thatthe few mutations revealed by Safe-SeqS represented errors occurringduring the sequencing process rather than true mutations present in theoriginal DNA templates. Potential sources of error in the Safe-SeqSprocess are described in the SI text.

Another potential application of Safe-SeqS is the minimization of PCRcontamination, a serious problem for clinical laboratories. Withendogenous or exogenous UID assignment, the UIDs of mutant templates cansimply be compared to those identified in prior experiments; theprobability that the same mutation from two independent samples wouldhave the same UID in different experiments is negligible when mutationsare infrequent. Additionally, with exogenous UIDs, a control experimentwith the same template but without the UID assigning PCR cycles (FIG.63) can ensure that no DNA contamination is present in that templatepreparation; no template should be amplified in the absence of UIDassignment cycles and thus no PCR product of the proper size should beobserved.

It was demonstrated that the exogenous UIDs strategy can be used toanalyze a single amplicon in depth. This technology may not beapplicable to situations wherein multiple amplicons must be analyzedfrom a sample containing a limited number of templates. Multiplexing inthe UID assignment cycles (FIG. 63) may provide a solution to thischallenge. A second potential concern is that clinical samples maycontain inhibitors that reduce the efficiency of this step. This problemcan presumably be overcome by performing more than two cycles in UIDassignment PCR step (FIG. 63), though this has the potential tocomplicate the determination of the number of templates analyzed. Thespecificity of Safe-SeqS is currently limited by the fidelity of thepolymerase used in the UID assignment PCR step, i.e., 8.8×10⁻⁷mutations/bp in its current implementation with two cycles. Increasingthe number of cycles in the UID assignment PCR step to five woulddecrease the overall specificity to ^(˜)2×10⁻⁶ mutations/bp. However,this specificity can be increased by requiring more than onesuper-mutant for mutation identification—the probability of introducingthe same artifactual mutation twice or three times would be exceedinglylow ([2×10⁻⁶]² or [2×10⁻⁶]³, respectively). In sum, there are severalsimple ways to perform Safe-SeqS variations and analysis variations torealize the needs of specific experiments.

Luria and Delbruck, in their classic paper in 1943, wrote that their“prediction cannot be verified directly, because what we observe, whenwe count the number of resistant bacteria in a culture, is not thenumber of mutations which have occurred but the number of resistantbacteria which have arisen by multiplication of those which mutated, theamount of multiplication depending on how far back the mutationoccurred.” Various Safe-SeqS procedures described here can verify suchpredictions because the number as well as the time of occurrence of eachmutation can be estimated from the data, as noted in the experiments onpolymerase fidelity. In addition to templates generated by polymerasesin vitro, the same approach can be applied to DNA from bacteria,viruses, and mammalian cells. It is therefore expected that thisstrategy will provide definitive answers to a variety of importantbiomedical questions.

In some embodiments, a genetic biomarker (e.g., one or more geneticbiomarkers) is detected using any of the variety of methods described inU.S. Patent Application Publication No. 2018/0208999, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include analysis of the count, the fragmentationpatterns, and size of cell-free nucleic acids, e.g., plasma DNA andserum DNA, including nucleic acids from pathogens, such as viruses.Various embodiments are directed to applications (e.g., classificationof biological samples) of the analysis of the count, the fragmentationpatterns, and size of cell-free nucleic acids, e.g., plasma DNA andserum DNA, including nucleic acids from pathogens, such as viruses. Someembodiments, of the application can determine if a subject has aparticular condition. For example, a method can determine if a subjecthas cancer or a tumor, or other pathology. Embodiments of anotherapplication can be used to assess the stage of a condition, or theprogression of a condition over time. For example, a method may be usedto determine a stage of cancer in a subject, or the progression ofcancer in a subject over time (e.g., using samples obtained from asubject at different times). According to one embodiment, sequence readsobtained from a sequencing of the mixture of cell free nucleic acidmolecules can be used to determine an amount of the sequence readsaligning to a reference genome corresponding to the virus. The amount ofsequence reads aligning to the reference genome can be compared to acutoff value to screen for the pathology. According to anotherembodiment, sizes of viral nucleic acid molecules (e.g., those aligningto a reference genome corresponding to the virus) can be used. Astatistical value of a size distribution of the nucleic acid moleculesfrom the virus can be determined. A level of pathology in the subjectcan be determined by processing the statistical value against a cutoffvalue. According to another embodiment, a first amount of cell-freenucleic acid molecules that end within one or more first windows of areference genome corresponding to the virus is determined. Each firstwindow comprising at least one of a first set of genomic positions atwhich ends of cell-free nucleic acid molecules are present at a rateabove a first threshold in subjects with a cancer (or other pathology)associated with the virus. A relative abundance can be computed bynormalizing the first amount using a second amount of cell-free nucleicacid molecules, which includes cell-free nucleic acid molecules endingat a second set of genomic positions outside of the one or more firstwindows including the first set of genomic positions. A level of cancerin the subject can be determined by processing the relative abundanceagainst a cutoff value. Embodiments can combine various techniques. Forexample, a first assay can be count-based, size-based, orfragmentation-based. A second assay can be one of the other techniques.As examples a majority voting can be used, or cutoff values can bedetermined for both techniques, thereby determining a set of data pointsfrom the two techniques that correspond to a particular level ofpathology.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0203974, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a computer-implemented method, involvingreceiving a data set in a computer comprising a processor and acomputer-readable medium, where the computer-readable medium comprisesinstructions that, when executed by the processor, cause the computer toe.g. identify somatic mutations in the biological test sample; andgenerate a somatic mutational profile that comprises the somaticmutations; and detecting the presence of the cancer in the patient basedon the exposure weights of the mutational signatures. Additionally oralternatively, detection of a genetic biomarker can include using anon-negative matrix factorization (NMF) approach to construct asignature matrix that can be used to identify latent signatures in apatient. In other embodiments, the methods may use principal componentsanalysis (PCA) or vector quantization (VQ) approaches to construct asignature matrix. In one example, the patient sample is a cell-freenucleic acid sample (e.g., cell-free DNA (cfDNA) and/or cell-free RNA(cfRNA)). The construction of a signature matrix using non-negativematrix factorization can be generalized to multiple features relevant tocancer detection and/or classification. In some embodiments, a signaturematrix comprises a plurality of signatures where the probability of theoccurrence for each of a plurality of features are represented. Examplesof relevant features include, but are not limited to, an upstreamsequence context of a base substitution mutation, a downstream sequencecontext of a base substitution mutation, an insertion, a deletion, asomatic copy number alteration (SCNA), a translocation, a genomicmethylation status, a chromatin state, a sequencing depth of coverage,an early versus late replicating region, a sense versus antisensestrand, an inter mutation distance, a variant allele frequency, afragment start/stop, a fragment length, and a gene expression status, orany combination thereof. In one embodiment, the upstream and/ordownstream sequence context can comprise a region of a nucleic acid thatranges in length from about 2 to about 40 bp, such as from about 3 toabout 30 bp, such as from about 3 to about 20 bp, or such as from about2 to about 10 bp of sequence context of a base substitution mutation. Inone embodiment, the upstream and/or downstream sequence context may be atriplet sequence context, a quadruplet sequence context, a quintupletsequence context, a sextuplet sequence context, or a septuplet sequencecontext of base substitution mutations. In some embodiments, theupstream and/or downstream sequence context can be the triplet sequencecontext of a base substitution mutation. In one embodiment, the methodsare used to identify latent somatic mutational signatures in a subject's(e.g., an asymptomatic subject) cfDNA sample for early detection ofcancer. In another embodiment, the methods are used to infer tissue oforigin for a patient's cancer based on latent mutational signaturesidentified in the patient's cfDNA sample. In yet another embodiment, themethods are used to identify latent mutational signatures in a patient'scfDNA sample that can be used to classify the patient for differenttypes of therapies. In yet another embodiment, non-negative matrixfactorization is applied to learn error modes in a somatic variant(mutation) calling assay. For example, systematic errors (e.g., errorscontributed during library preparation, PCR, hybridization capture,and/or sequencing) that underlie the assay can be identified andassigned unique signatures that can be used to distinguish between thecontribution from true somatic variants and artifactual variants arisingfrom the technical processes in the assay. In yet another embodiment,non-negative matrix factorization can be used to identify mutationalsignatures that are associated with healthy aging. Mutation processesthat are associated with aging are assigned mutational signatures thatcan be used to distinguish between healthy somatic mutations associatedwith patient age and somatic mutations contributed from, and indicativeof, a cancer process in the patient. In another embodiment, one or moremutational signatures can be monitored over time and used fordiagnosing, monitoring, and/or classifying cancer. For example, theobserved mutational profile in cfDNA from patient samples at two or moretime points can be evaluated. In some embodiments, two or moremutational signature processes can be evaluated as a combination ofdifferent mutational signatures. In still another embodiment, one ormore mutational signatures can be monitored over time (e.g., at aplurality of time points) to monitor the effectiveness of a therapeuticregimen or other cancer treatment. Somatic mutations (i.e., drivermutations and passenger mutations) in a cancer genome are typically thecumulative consequence of one or more mutational processes of DNA damageand repair. Although not wishing to be bound by theory, it is believedthat the strength and duration of exposure to each mutational process(e.g., environmental factors and DNA repair processes) results in aunique profile of somatic mutations in a subject (e.g., a cancerpatient). These unique combinations of mutation types form a unique“mutational signature” for the cancer patient. Furthermore, as is wellknown in the art, a somatic mutation, or mutational profile can dependon the particular sequence context of the mutation. For example, UVdamage typically results in a base change of C to T, when the basechange occurs within a sequence context of (-T|C|-) C(A|T|C|G). In thisexample, C is the mutated base and the bases upstream (T or C) anddownstream (A, T, C, or G) of C affect the probability of a mutationunder UV radiation. In another example, spontaneous deamination of5-methylcytosine typically results in a base change of C to T, when thebase change occurs within a sequence context of (A|T|C|G)C(-|-|-|G).Accordingly, in one embodiment, the sequence context of identifiedmutations can be utilized as a feature for analyzing somatic mutationsin the detection and/or classification of cancer.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/119399, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods for preparing a sequencing library from aDNA-containing test sample, including methods for rescuing one or morepartially ligated DNA fragments to enhance library preparationconversion efficiencies. The methods can further be used to improverecovery of duplex sequence information from double-stranded DNA.Additionally or alternatively, detection of a genetic biomarker caninclude a method for preparing a double-stranded DNA sequencing library,the method comprising the following steps: (a) obtaining a test samplecomprising a plurality of double-stranded DNA (dsDNA) fragments, whereinthe dsDNA fragments comprise a forward strand and a reverse strand; (b)ligating double-strand DNA adapters to both ends of the dsDNA fragments;and (c) extending unligated 3′-ends of the dsDNA fragments with a DNApolymerase to create dsDNA fragment-adapter templates to prepare asequencing library. In some embodiments, the dsDNA fragment-adaptertemplates are further amplified prior to sequencing. In otherembodiments, one or more steps of the method may be carried out in asingle reaction step. For example, steps (b) through (c) may be carriedout in a single reaction tube utilizing a reaction mixture comprising afirst set of dsDNA adapters, a ligase, a polymerase (optionally havingstrand-displacement activity), a terminal deoxynucleotidyl transferase,and a second set of ssDNA oligonucleotides or primers (e.g., includingsequencing adapters and/or a universal primer). Optionally, the dsDNAmolecules can be purified, and optionally fragmented, from test sampleprior to ligation step (b). Additionally or alternatively, detection ofa genetic biomarker can include a method for preparing a double-strandedDNA sequencing library, the method comprising the following steps: (a)obtaining a test sample comprising a plurality of double-stranded DNA(dsDNA) fragments, the dsDNA fragments comprising a forward strand and areverse strand; (b) adding double-stranded adapters to the dsDNAfragments and ligating the double-strand adapters to both ends of thedsDNA fragments; (c) extending unligated 3′-ends of the dsDNA fragmentswith a polymerase to create dsDNA fragment-adapter templates, whereinthe polymerase further comprises strand displacement activity; (d)adding a poly-adenine tail to the 3′-ends of the dsDNA fragment-adaptertemplates; (e) adding a set of ssDNA oligonucleotides (or primers) andhybridizing the ssDNA oligonucleotides to the dsDNA fragment-adaptertemplates; and (f) extending the set of ssDNA oligonucleotides to createa dsDNA sequencing library. In some embodiments, one or more steps ofthe method may be carried out in a single reaction step. For example,steps (b) through (f) may be carried out in a single reaction tubeutilizing a reaction mixture comprising a first set of dsDNA adapters, aligase, a polymerase (optionally having strand-displacement activity), aterminal deoxynucleotidyl transferase, and a second set of ssDNAoligonucleotides or primers (e.g., including sequencing adapters and/ora universal primer). Optionally, the dsDNA molecules can be purified,and optionally fragmented, from test sample prior to ligation step (b).

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/119438, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods of analyzing sequencing data to detectCNVs in a nucleic acid sample. Detecting CNVs in a nucleic acid sampleobtained from a human subject can be informative for determining apresence of cancer in the subject. In one embodiment, detecting CNVs ina nucleic acid sample obtained from a human subject can be used forearly detection of cancer in the subject. In various embodiments, themethods determine coverage at individual nucleotide bases determinedfrom targeted sequencing reads. Sources of coverage variation can becorrected at the base level. For each gene of a targeted gene panel, thedetermined base level coverage across bases of the gene can beconsidered to more effectively detect CNVs of each gene. Generally,baseline coverage biases that exist at each base position can be modeledusing training data gathered from healthy individuals. Therefore, whenanalyzing a test sample obtained from a subject, the base level coveragecan be determined for each base position in view of the expectedcoverage biases obtained through modeling. Specifically, if the coveragebias at a base position for a test sample obtained from the subjectdiffers from the expected coverage bias obtained through modeling,coverage biases can be normalized and removed. For a gene in a targetedgene panel, base level coverages across the base positions of the geneare analyzed to determine whether the coverage for the gene differs froman expected level of coverage for the gene as previously determinedusing training data gathered from healthy individuals. If so, a CNV canbe called. The calling of a CNV can indicate a presence of cancer in thesubject or that the subject is susceptible to an increased likelihood ofdeveloping cancer.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/111872, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include preparing sequencing libraries based on aplurality of RNA molecules which are tagged and amplified by tagging themolecule with an oligonucleotide hybridising to a polyC tail introducedby the terminal transferase activity of the reverse transcriptase, e.g.MMLV RT, and ligating the oligonucleotide to the RNA molecule using aligase, e.g. T4 RNA ligase, and producing cDNA molecules based on mRNAand strand displacing reverse transcriptases, e.g. MMLV RT, andproducing a second cDNA strand in order to produce a dsDNA library forsequencing. In some embodiments, the methods comprise sequencing atleast a portion of a sequencing library to obtain sequencing data orsequence reads from a test sample (e.g., a biological sample from asubject). In one embodiment, the method for preparing a sequencinglibrary from a test sample comprising RNA comprises the steps: (a)obtaining a test sample comprising RNA sequences, and purifying the RNAsequences from the test sample; (b) synthesizing first complementary DNA(cDNA) strands based on the RNA sequences and C-tailing 3′-ends of cDNAstrands; (c) annealing a complementary template switchingoligonucleotide to the C-tail of the cDNA and ligating the complementarytemplate switching oligonucleotide to the 5′-ends of the RNA sequencesto produce RNA templates; and (d) synthesizing a plurality of cDNAstrands from the RNA templates using a strand-displacement reversetranscriptase. In some embodiments, one or more steps of the method maybe carried out in a single reaction step. For example, steps (b) through(d) may be carried out in a single reaction tube utilizing a reactionmixture comprising RNA primers (e.g., random hexamer RNA primers, polyTprimers, or a combination thereof), a strand-displacement reversetranscriptase (e.g., MMLV reverse transcriptase), an RNA ligase (e.g.,T4 RNA ligase), and optionally, a polynucleotide kinase (e.g., T4polynucleotide kinase). In some embodiments, the method for preparing asequencing library from a test sample comprising RNA, comprises thesteps: (a) obtaining a test sample comprising one or more RNA sequences,and purifying the one or more RNA sequences from the test sample; (b)annealing a first RNA primer to the one or more RNA sequences; (c)extending the first RNA primer in a first nucleic acid extensionreaction using reverse transcriptase, wherein the reverse transcriptasecomprises reverse transcription and terminal transferase activities, togenerate a plurality of DNA sequences complementary to the one or moreRNA templates, and wherein the complementary DNA (cDNA) sequencesfurther comprise a plurality of non-templated bases at the 3′-end of thecDNA sequences; (d) annealing a complementary nucleic acid sequence tothe non-templated bases at the 3′-end of the cDNA sequence, wherein thecomplementary nucleic acid sequence further comprises a unique molecularidentifier (UMI) or a unique sequence tag; (e) ligating thecomplementary nucleic acid sequence to the 5′-end of the one or more RNAsequences to generate one or more RNA templates, wherein the one or moreRNA templates comprise the original one or more RNA sequences covalentlylinked to the complementary nucleic acid sequence comprising the UMI orunique sequence tag; (f) annealing one or more second RNA primers to theone or more RNA template; and (g) extending the one or more second RNAprimers in a second nucleic acid extension reaction using astrand-displacement reverse transcriptase to generate a plurality of DNAsequence complementary to the one or more RNA templates, wherein theplurality of complementary DNA (cDNA) sequences each comprise thecomplementary DNA sequence and a UMI or unique sequence tag. In someembodiments, one or more steps of the method may be carried out in asingle reaction step. For example, in some embodiments, steps (b)through (g) may be carried out in a single reaction tube utilizing areaction mixture comprising RNA primers (e.g., random hexamer RNAprimers, polyT primers, or a combination thereof), a strand-displacementreverse transcriptase (e.g., MMLV reverse transcriptase), an RNA ligase(e.g., T4 RNA ligase), and optionally, a polynucleotide kinase (e.g., T4polynucleotide kinase). In one embodiment, the method involves preparinga sequencing library from a test sample comprising RNA molecules, themethod comprising the steps: (a) obtaining a test sample comprising oneor more RNA sequences, and purifying the one or more RNA sequences fromthe test sample; (b) annealing a first RNA primer to the one or more RNAsequences; (c) extending the first RNA primer in a first nucleic acidextension reaction using a reverse transcriptase, wherein the reversetranscriptase comprises reverse transcription and terminal transferaseactivities, to generate a plurality of DNA sequences complementary tothe one or more RNA sequences, wherein the terminal transferase activityadds a cytosine (C) tail to the 3′-end of the complementary DNA (cDNA)sequences; (d) annealing a template switching oligonucleotide to the3′-cytosine tail of the cDNA sequence, wherein the template switchingoligonucleotide comprises a unique molecular identifier (UMI) or aunique sequence tag; (e) ligating the template switching oligonucleotideto the 5′-end of the one or more RNA sequences with T4 RNA ligase togenerate one or more RNA templates, wherein the one or more RNAtemplates comprise the original one or more RNA sequences covalentlylinked to the template switching oligonucleotide and the UMI or uniquesequence tag; (f) annealing a plurality of second RNA primers to the oneor more RNA templates; and (g) extending the plurality of second RNAprimers in a second nucleic acid extension reaction using astrand-displacement reverse transcriptase to generate a plurality of DNAsequence complementary to the one or more RNA templates, wherein theplurality of complementary DNA (cDNA) each comprise the complementaryDNA sequence and a UMI or unique sequence tag. In some embodiments, oneor more steps of a method can be carried out in a single reaction step.For example, steps (b) through (g) may be carried out in a singlereaction tube utilizing a reaction mixture comprising RNA primers (e.g.,random hexamer RNA primers, polyT primers, or a combination thereof), astrand-displacement reverse transcriptase (e.g., MMLV reversetranscriptase), an RNA ligase (e.g., T4 RNA ligase), and optionally, apolynucleotide kinase (e.g., T4 polynucleotide kinase).

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/085862, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods and systems for identifying somaticmutational signatures for detecting, diagnosing, monitoring and/orclassifying cancer in a patient known to have, or suspected of havingcancer. In various embodiments, the methods use a non-negative matrixfactorization (NMF) approach to construct a signature matrix that can beused to identify latent signatures in a patient sample for detection andclassification of cancer. In other embodiments, the methods may useprincipal components analysis (PCA) or vector quantization (VQ)approaches to construct a signature matrix. In one example, the patientsample is a cell-free nucleic acid sample (e.g., cell-free DNA (cfDNA)and/or cell-free RNA (cfRNA)). The construction of a signature matrixusing non-negative matrix factorization can be generalized to multiplefeatures relevant to cancer detection and/or classification. In someembodiments, a signature matrix comprises a plurality of signatureswhere the probability of the occurrence for each of a plurality offeatures are represented. Examples of relevant features include, but arenot limited to, an upstream sequence context of a base substitutionmutation, a downstream sequence context of a base substitution mutation,an insertion, a deletion, a somatic copy number alteration (SCNA), atranslocation, a genomic methylation status, a chromatin state, asequencing depth of coverage, an early versus late replicating region, asense versus antisense strand, an inter mutation distance, a variantallele frequency, a fragment start/stop, a fragment length, and a geneexpression status, or any combination thereof. In one embodiment, theupstream and/or downstream sequence context can comprise a region of anucleic acid that ranges in length from about 2 to about 40 bp, such asfrom about 3 to about 30 bp, such as from about 3 to about 20 bp, orsuch as from about 2 to about 10 bp of sequence context of a basesubstitution mutation. In one embodiment, the upstream and/or downstreamsequence context may be a triplet sequence context, a quadrupletsequence context, a quintuplet sequence context, a sextuplet sequencecontext, or a septuplet sequence context of base substitution mutations.In some embodiments, the upstream and/or downstream sequence context canbe the triplet sequence context of a base substitution mutation. In oneembodiment, the methods are used to identify latent somatic mutationalsignatures in a subject's (e.g., an asymptomatic subject) cfDNA samplefor early detection of cancer. In another embodiment, the methods areused to infer tissue of origin for a patient's cancer based on latentmutational signatures identified in the patient's cfDNA sample. In yetanother embodiment, the methods are used to identify latent mutationalsignatures in a patient's cfDNA sample that can be used to classify thepatient for different types of therapies. In yet another embodiment,non-negative matrix factorization is applied to learn error modes in asomatic variant (mutation) calling assay. For example, systematic errors(e.g., errors contributed during library preparation, PCR, hybridizationcapture, and/or sequencing) that underlie the assay can be identifiedand assigned unique signatures that can be used to distinguish betweenthe contribution from true somatic variants and artifactual variantsarising from the technical processes in the assay. In yet anotherembodiment, non-negative matrix factorization can be used to identifymutational signatures that are associated with healthy aging. Mutationprocesses that are associated with aging are assigned mutationalsignatures that can be used to distinguish between healthy somaticmutations associated with patient age and somatic mutations contributedfrom, and indicative of, a cancer process in the patient. In anotherembodiment, one or more mutational signatures can be monitored over timeand used for diagnosing, monitoring, and/or classifying cancer. Forexample, the observed mutational profile in cfDNA from patient samplesat two or more time points can be evaluated. In some embodiments, two ormore mutational signature processes can be evaluated as a combination ofdifferent mutational signatures. In still another embodiment, one ormore mutational signatures can be monitored over time (e.g., at aplurality of time points) to monitor the effectiveness of a therapeuticregimen or other cancer treatment. In one embodiment, the sequencecontext of identified mutations can be utilized as a feature foranalyzing somatic mutations in the detection and/or classification ofcancer.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0163201, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods for preparing sequencing librariescomprising a plurality of RNA molecules. In one embodiment, the methodfor preparing a sequencing library from a test sample comprising RNA,comprises the steps: (a) obtaining a test sample comprising RNAsequences, and purifying the RNA sequences from the test sample; (b)synthesizing first complementary DNA (cDNA) strands based on the RNAsequences and C-tailing 3′-ends of cDNA strands; (c) annealing acomplementary template switching oligonucleotide to the C-tail of thecDNA and ligating the complementary template switching oligonucleotideto the 5′-ends of the RNA sequences to produce RNA templates; and (d)synthesizing a plurality of cDNA strands from the RNA templates using astrand-displacement reverse transcriptase. In some embodiments, one ormore steps of the method may be carried out in a single reaction step.For example, steps (b) through (d) may be carried out in a singlereaction tube utilizing a reaction mixture comprising RNA primers (e.g.,random hexamer RNA primers, polyT primers, or a combination thereof), astrand-displacement reverse transcriptase (e.g., MMLV reversetranscriptase), an RNA ligase (e.g., T4 RNA ligase), and optionally, apolynucleotide kinase (e.g., T4 polynucleotide kinase). In oneembodiment, the method comprises the steps: (a) obtaining a test samplecomprising one or more RNA sequences, and purifying the one or more RNAsequences from the test sample; (b) annealing a first RNA primer to theone or more RNA sequences; (c) extending the first RNA primer in a firstnucleic acid extension reaction using reverse transcriptase, wherein thereverse transcriptase comprises reverse transcription and terminaltransferase activities, to generate a plurality of DNA sequencescomplementary to the one or more RNA templates, and wherein thecomplementary DNA (cDNA) sequences further comprise a plurality ofnon-templated bases at the 3′-end of the cDNA sequences; (d) annealing acomplementary nucleic acid sequence to the non-templated bases at the3′-end of the cDNA sequence, wherein the complementary nucleic acidsequence further comprises a unique molecular identifier (UMI) or aunique sequence tag; (e) ligating the complementary nucleic acidsequence to the 5′-end of the one or more RNA sequences to generate oneor more RNA templates, wherein the one or more RNA templates comprisethe original one or more RNA sequences covalently linked to thecomplementary nucleic acid sequence comprising the UMI or uniquesequence tag; (f) annealing one or more second RNA primers to the one ormore RNA template; and (g) extending the one or more second RNA primersin a second nucleic acid extension reaction using a strand-displacementreverse transcriptase to generate a plurality of DNA sequencecomplementary to the one or more RNA templates, wherein the plurality ofcomplementary DNA (cDNA) sequences each comprise the complementary DNAsequence and a UMI or unique sequence tag. In some embodiments, one ormore steps of the method may be carried out in a single reaction step.For example, in some embodiments, steps (b) through (g) may be carriedout in a single reaction tube utilizing a reaction mixture comprisingRNA primers (e.g., random hexamer RNA primers, polyT primers, or acombination thereof), a strand-displacement reverse transcriptase (e.g.,MMLV reverse transcriptase), an RNA ligase (e.g., T4 RNA ligase), andoptionally, a polynucleotide kinase (e.g., T4 polynucleotide kinase). Inone embodiment, a method involves preparing a sequencing library from atest sample comprising RNA molecules, the method comprising the steps:(a) obtaining a test sample comprising one or more RNA sequences, andpurifying the one or more RNA sequences from the test sample; (b)annealing a first RNA primer to the one or more RNA sequences; (c)extending the first RNA primer in a first nucleic acid extensionreaction using a reverse transcriptase, wherein the reversetranscriptase comprises reverse transcription and terminal transferaseactivities, to generate a plurality of DNA sequences complementary tothe one or more RNA sequences, wherein the terminal transferase activityadds a cytosine (C) tail to the 3′-end of the complementary DNA (cDNA)sequences; (d) annealing a template switching oligonucleotide to the3′-cytosine tail of the cDNA sequence, wherein the template switchingoligonucleotide comprises a unique molecular identifier (UMI) or aunique sequence tag; (e) ligating the template switching oligonucleotideto the 5′-end of the one or more RNA sequences with T4 RNA ligase togenerate one or more RNA templates, wherein the one or more RNAtemplates comprise the original one or more RNA sequences covalentlylinked to the template switching oligonucleotide and the UMI or uniquesequence tag; (f) annealing a plurality of second RNA primers to the oneor more RNA templates; and (g) extending the plurality of second RNAprimers in a second nucleic acid extension reaction using astrand-displacement reverse transcriptase to generate a plurality of DNAsequence complementary to the one or more RNA templates, wherein theplurality of complementary DNA (cDNA) each comprise the complementaryDNA sequence and a UMI or unique sequence tag. In some embodiments, oneor more steps of a method can be carried out in a single reaction step.For example, steps (b) through (g) may be carried out in a singlereaction tube utilizing a reaction mixture comprising RNA primers (e.g.,random hexamer RNA primers, polyT primers, or a combination thereof), astrand-displacement reverse transcriptase (e.g., MMLV reversetranscriptase), an RNA ligase (e.g., T4 RNA ligase), and optionally, apolynucleotide kinase (e.g., T4 polynucleotide kinase).

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/081130, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a method comprising obtaining a first biologicalsample from the subject, wherein the first biological sample comprisescell-free nucleic acid from the subject and potentially cell-freenucleic acid from a pathogen. In some embodiments, the method comprisesperforming a first assay comprising measuring a copy number of thecell-free nucleic acid from the pathogen in the first biological sample.In some embodiments, the method comprises obtaining a second biologicalsample from the subject, wherein the second biological sample comprisescell-free nucleic acid from the subject and potentially cell-freenucleic acid from a pathogen. In some embodiments, the method comprisesperforming a second assay comprising massively parallel sequencing ofthe cell-free nucleic acid in the second biological sample to generatesequence reads. In some embodiments, the method comprises determining anamount of the sequence reads that align to a reference genome of thepathogen. In some embodiments, the method comprises determining anamount of the cell-free nucleic acid molecules that have a size within agiven range and align to a reference genome of the pathogen based on themassively parallel sequencing. In some embodiments, the method comprisesscreening for the tumor based on performing a first assay comprisingmeasuring a copy number of the cell-free nucleic acid from the pathogenin the first biological sample, and performing a second assay comprisingmassively parallel sequencing of the cell-free nucleic acid in thesecond biological sample to generate sequence reads. In someembodiments, the first biological sample and the second biologicalsample are the same. In some embodiments, the method further comprisesdetermining a percentage of the sequence reads that align to a referencegenome of the pathogen. In some embodiments, the method furthercomprises comparing the percentage of the sequence reads that align to areference genome of the pathogen to a cutoff value. In some embodiments,the method further comprises determining a size ratio of a firstproportion of the cell-free nucleic acid molecules from the secondbiological sample that align to the reference genome of the pathogenwith a size within the given range and a second proportion of thecell-free nucleic acid molecules from the second biological sample thatalign to a reference genome of the subject with a size within the givenrange. In some embodiments, the method further comprises determining asize index, wherein the size index is an inverse of the size ratio, andcomparing the size index to a second cutoff value. In some embodiments,the tumor is nasopharyngeal cancer. In some embodiments, the pathogen isEpstein-Barr Virus (EBV). In some embodiments, measuring a copy numberof the cell-free nucleic acid from the pathogen in the first biologicalsample comprises amplification. In some embodiments, the amplificationcomprises polymerase chain reaction (PCR). In some embodiments, the PCRcomprises quantitative PCR (qPCR). In some embodiments, the firstbiological sample and the second biological sample are plasma. In someembodiments, the method comprises obtaining a first biological samplefrom the subject, wherein the first biological sample comprisescell-free nucleic acid from the subject and potentially cell-freenucleic acid from a pathogen. In some embodiments, the method comprisesperforming a first assay comprising measuring a copy number of thecell-free nucleic acid from the pathogen in the first biological sample,wherein the first assay comprises a positive predictive value for apresence of the tumor in the subject. In some embodiments, the methodcomprises performing a second assay on a second biological sample fromthe subject, wherein the second biological sample comprises cell-freenucleic acid from the subject and potentially cell-free nucleic acidfrom the pathogen, and wherein a positive predictive value for apresence of the tumor in the subject of the first assay and the secondassay is at least 5-fold greater than the positive predictive value ofthe first assay. In some embodiments, the positive predictive value fora presence of the tumor in the subject of the first assay and the secondassay is at least 7.5-fold greater than the positive predictive value ofthe first assay. In some embodiments, the positive predictive value fora presence of the tumor in the subject of the first assay and the secondassay is at least 15%. In some embodiments, the positive predictivevalue for a presence of the tumor in the subject of the first assay andthe second assay is at least 25%. In some embodiments, the firstbiological sample and the second biological sample are the same. In someembodiments, the first biological sample and the second biologicalsample are plasma. In some embodiments, the tumor is nasopharyngealcancer. In some embodiments, the pathogen is Epstein-Barr Virus (EBV).In some embodiments, measuring a copy number of the cell-free nucleicacid from the pathogen in the first biological sample comprisesamplification. In some embodiments, the amplification comprisespolymerase chain reaction (PCR). In some embodiments, the PCR comprisesquantitative PCR (qPCR). In some embodiments, the second assay comprisesmassively parallel sequencing of the cell-free nucleic acid in thesecond biological sample to generate sequence reads. In someembodiments, the second assay comprises of determining an amount of thesequence reads that align to a reference genome of the pathogen. In someembodiments, the second assay comprises determining an amount of thecell-free nucleic acid molecules in the second biological sample thathave a size within a given range and align to a reference genome of thepathogen. In some embodiments, the method comprises obtaining a firstbiological sample from the subject, wherein the first biological samplecomprises cell-free nucleic acid from the subject and potentiallycell-free nucleic acid from a pathogen. In some embodiments, the methodcomprises performing a first assay comprising measuring a copy number ofthe cell-free nucleic acid from the pathogen in the first biologicalsample, wherein the first assay has a false positive rate for a presenceof the tumor in the subject. In some embodiments, the method comprisesperforming a second assay on a second biological sample from thesubject, wherein the second biological sample comprises cell-freenucleic acid from the subject and potentially cell-free nucleic acidfrom the pathogen, wherein a false positive rate for a presence of thetumor in the subject of the first assay and the second assay is at least5-fold lower than the false positive rate of the first assay. In someembodiments, the false positive rate for a presence of the tumor in thesubject of the first assay and the second assay is at least 10-foldlower than the false positive rate of the first assay. In someembodiments, the false positive rate for a presence of the tumor in thesubject of the first assay and the second assay is less than 1%. In someembodiments, the first biological sample and the second biologicalsample are the same. In some embodiments, the first biological sampleand the second biological sample are plasma. In some embodiments, thetumor is nasopharyngeal cancer. In some embodiments, the pathogen isEpstein-Barr Virus (EBV). In some embodiments, measuring a copy numberof the cell-free nucleic acid from the pathogen in the first biologicalsample comprises amplification. In some embodiments, the amplificationcomprises polymerase chain reaction (PCR). In some embodiments, the PCRcomprises quantitative PCR (qPCR). In some embodiments, the second assaycomprises massively parallel sequencing of the cell-free nucleic acid inthe second biological sample to generate sequence reads. In someembodiments, the second assay comprises of determining an amount of thesequence reads that align to a reference genome of the pathogen. In someembodiments, the second assay comprises determining an amount of thecell-free nucleic acid molecules in the second biological sample thathave a size within a given range and align to a reference genome of thepathogen. In some embodiments, the method comprises analyzing abiological sample, including a mixture of cell-free nucleic acidmolecules, to determine a level of pathology in a subject from which thebiological sample is obtained, the mixture including nucleic acidmolecules from the subject and potentially nucleic acid molecules from apathogen. In some embodiments, the method comprises analyzing a firstplurality of cell-free nucleic acid molecules from a biological sampleof the subject, wherein the analyzing comprises determining a genomicposition in a reference genome corresponding to at least one end of thefirst plurality of cell-free nucleic acid molecules, the referencegenome corresponding to the pathogen. In some embodiments, the methodcomprises determining a first amount of the first plurality of cell-freenucleic acid molecules that end within one of first windows, each firstwindow comprising at least one of a first set of genomic positions atwhich ends of cell-free nucleic acid molecules are present at a rateabove a first threshold in subjects with a pathology associated with thepathogen. In some embodiments, the method comprises computing a relativeabundance of the first plurality of cell-free nucleic acid moleculesending within one of the first windows by normalizing the first amountusing a second amount of the first plurality of cell-free nucleic acidmolecules from the biological sample, wherein the second amount of thefirst plurality of cell-free nucleic acid molecules includes cell-freenucleic acid molecules ending at a second set of genomic positionsoutside of the first windows including the first set of genomicpositions. In some embodiments, the method comprises determining thelevel of pathology in the subject by processing the relative abundanceagainst one or more cutoff values. In some embodiments, the relativeabundance against one or more cutoff values includes determining whetherthe relative abundance is greater than the one or more cutoff values. Insome embodiments, the method further comprises determining the secondamount of the first plurality of cell-free nucleic acid molecules thatend within one of second windows, each second window comprising at leastone of the second set of genomic positions at which ends of cell-freenucleic acid molecules are present at a rate above a second threshold insubjects without a pathology resulting from pathogen, whereinnormalizing the first amount includes computing the relative abundanceusing the first amount and the second amount. In some embodiments, themethod further comprises identifying the second set of genomicpositions. In some embodiments, the identifying comprises analyzing, bya computer system, the cell-free nucleic acid molecules of a referencesample from a reference subject that does not have the pathology. Insome embodiments, analyzing each of the plurality of cell-free nucleicacid molecules comprises determining a genomic position in the referencegenome corresponding to at least one end of the cell-free nucleic acidmolecule. In some embodiments, the reference subject is healthy. In someembodiments, the relative abundance comprises a ratio of the firstamount and the second amount. In some embodiments, the method furthercomprises identifying the first set of genomic positions at which endsof cell-free nucleic acid molecules occur at the rate above a firstthreshold. In some embodiments, identifying the first set of genomicpositions comprises analyzing, by a computer system, a second pluralityof cell-free nucleic acid molecules from at least one first additionalsample to identify ending positions of the second plurality of cell-freenucleic acid molecules, wherein the at least one first additional sampleis known to have the pathology associated with the pathogen and is of asame sample type as the biological sample. In some embodiments, themethod further comprises, for each genomic window of a plurality ofgenomic windows, computing a corresponding number of the secondplurality of cell-free nucleic acid molecules ending on the genomicwindow, and comparing the corresponding number to a reference value todetermine whether the rate of cell-free nucleic acid molecules ending onone or more genomic positions within the genomic window is above thefirst threshold. In some embodiments, a first genomic window of theplurality of genomic windows has a width of at least one genomicposition, and wherein each of the genomic positions within the firstgenomic window are identified as having the rate of cell-free nucleicacid molecules ending on the genomic position be above the firstthreshold when the corresponding number exceeds the reference value. Insome embodiments, the first set of genomic positions have the highest Nvalues for the corresponding numbers, wherein N is at least 100.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 20180119216, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods for preparing and analyzing asingle-stranded sequencing library from a double-stranded DNA (e.g.,double-stranded cfDNA) sample. In some embodiments, the sample includesdouble-stranded DNA (dsDNA) molecules, and damaged dsDNA (e.g., nickeddsDNA) molecules. In some embodiments, the sample includessingle-stranded DNA (ssDNA) molecules. The methods facilitate thecollection of information, including strand-pairing and connectivityinformation, from dsDNA, ssDNA and damaged DNA (e.g., nicked DNA)molecules in a sample, thereby providing enhanced diagnostic informationas compared to sequencing libraries that are prepared using conventionalmethods. Additionally or alternatively, detection of a genetic biomarkercan include preparing a single stranded DNA (ssDNA) library forsequencing. For example, detection of a genetic biomarker can includeusing a ssDNA library preparation wherein both the forward (sense) andreverse (antisense) strands of a double stranded DNA fragment are taggedwith an identical, or substantially identical, unique sequence tag(e.g., a partition-specific barcode or UMI) that allows for thecomplementary strands from a dsDNA molecule to be identified andanalyzed. In one embodiment, the method comprises preparing asingle-stranded DNA library for sequencing, the method comprising thefollowing steps: (a) obtaining a test sample comprising double strandedDNA (dsDNA) and isolating dsDNA from the test sample; (b) partitioningthe dsDNA sample into a plurality of individual reaction compartments;(c) adding a reaction mixture to each of said individual reactioncompartments, said reaction mixture including a plurality ofoligonucleotide comprising a unique sequence tag; (d) denaturing dsDNAto produce single-strand DNA (ssDNA) fragments; and (e) ligating uniquesequence tags to the ssDNA fragments. In another embodiment, a method isprovided for preparing a cell-free DNA library for sequencing, themethod comprising the following steps: (a) obtaining a test samplecomprising cell-free double stranded DNA (dsDNA) and isolating dsDNAfrom the test sample; (b) partitioning the dsDNA sample into a pluralityof individual reaction droplets; (c) adding a reaction mixture to eachof said individual droplets, said reaction mixture including a pluralityof DNA capture beads, wherein each of said DNA capture beads includes aplurality of attached oligonucleotides comprising unique sequence tag;(d) heating the droplets to denature the dsDNA or chemically denaturingthe dsDNA to produce single-strand DNA (ssDNA) fragments and to releasethe unique sequence tags from the beads; and (e) ligating the uniquesequence tags to 3′ ends of the ssDNA fragments. In some embodiments,said beads are selected from the group comprising streptavidin-coatedbeads, solid phase reversible immobilization (SPRI) bead, and magneticbeads. In another embodiment, a method is provided for preparing asingle-stranded DNA library for sequencing, the method comprising thefollowing steps: (a) providing a plurality of partitions, whereinindividual partitions of the plurality comprise: (i) a portion of a testsample comprising, e.g., damaged and/or undamaged, double stranded DNA(dsDNA) isolated from one or more individuals; and (ii) a plurality ofoligonucleotides, wherein the plurality of oligonucleotides comprise apartition-specific barcode; (b) incubating the partitions underconditions suitable to denature the double-stranded DNA intosingle-stranded DNA; and (c) ligating the single-stranded DNA to theoligonucleotides, wherein the ligating covalently links thepartition-specific barcode to the single-stranded DNA and producespartition-specific barcoded single-stranded DNA. In some embodiments,the method further comprises combining the plurality of partitions. Insome embodiments, the method further comprises hybridizingoligonucleotide primer to the partition-specific barcodedsingle-stranded DNA and extending the primer, thereby producingpartition-specific barcoded double-stranded DNA. In some embodiments,the method comprises amplifying the partition-specific barcodedsingle-stranded DNA and/or the partition-specific barcodeddouble-stranded DNA. In some embodiments, the method further comprisesdephosphorylating the double stranded DNA isolated from one or moreindividuals. In some embodiments, the method comprises dephosphorylatingthe double stranded DNA isolated from one or more individuals and thenpartitioning the double stranded DNA isolated from one or moreindividuals, thereby providing the plurality of partitions.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0087105, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods for preparing and analyzing asequencing library from a mixed cell-free DNA (cfDNA) sample, whereinthe mixed sample includes double-stranded DNA (dsDNA), damaged dsDNA(e.g., nicked dsDNA), and single-stranded DNA (ssDNA) molecules. Thesubject methods facilitate the collection of information from dsDNA,ssDNA and damaged DNA (e.g., nicked DNA) molecules in a sample, therebyproviding enhanced diagnostic information as compared to sequencinglibraries that are prepared from dsDNA alone. In some embodiments, themethod comprise preparing a combined cell-free DNA (cfDNA) sequencinglibrary from a mixed cfDNA sample by: ligating a universal adaptercomprising a unique sequence tag to at least one single-stranded DNA(ssDNA) molecule in the mixed cfDNA sample; extending the universaladapter to generate an ssDNA-derived double-stranded DNA (dsDNA)molecule; and generating a combined cfDNA sequencing library from thessDNA-derived dsDNA molecule. In some embodiments, a method furthercomprises ligating a sequencing Y-adapter to the ssDNA-derived dsDNAmolecule before generating the combined cfDNA sequencing library. Insome embodiments, the sequencing Y-adapter comprises a unique sequencetag. In some embodiments, the first and second unique sequence tags aredifferent. In some embodiments, the method further comprises: extendingthe second sequencing Y-adapter to generate a second nick-derived dsDNAmolecule; ligating a third sequencing Y-adapter to the secondnick-derived dsDNA molecule; and generating a combined cfDNA sequencinglibrary from the first and the second nick-derived dsDNA molecules. Insome embodiments, the method for preparing a combined cfDNA sequencinglibrary from a mixed cfDNA sample comprises: ligating a first sequencingY-adapter to a first end of a nicked dsDNA molecule in the mixed cfDNAsample, wherein the nicked dsDNA molecule comprises a nicked strand andan unnicked strand; ligating a second sequencing Y-adapter to a secondend of the nicked dsDNA molecule in the mixed cfDNA sample; denaturingthe sequencing Y-adapter-ligated nicked dsDNA molecule to generate afirst ssDNA molecule derived from the unnicked strand, a second ssDNAmolecule derived from the nicked strand, and a third ssDNA moleculederived from the nicked strand; extending the second sequencingY-adapter to generate a first nick-derived dsDNA molecule; ligating athird sequencing Y-adapter to the first nick-derived dsDNA molecule; andgenerating a combined cfDNA sequencing library from the firstnick-derived dsDNA molecule. In some embodiments, the first sequencingY-adapter comprises a first unique sequence tag, the second sequencingY-adapter comprises a second unique sequence tag, and the thirdsequencing Y-adapter comprises a third unique sequence tag. In someembodiments, the first, second and third unique sequence tags are thesame. In some embodiments, the first, second and third unique sequencetags are different. In some embodiments, the first and second uniquesequence tags are the same, and the third unique sequence tag isdifferent. In some embodiments, the first and third unique sequence tagsare the same, and the second unique sequence tag is different. In someembodiments, the second and third unique sequence tags are the same, andthe first unique sequence tag is different.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0002749, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods, compositions, reactions mixtures,kits, and systems for sequencing both RNA and DNA from a single sourcesample. In some embodiments, RNA is treated so as to differentiate RNAsequences from DNA sequences derived from the same sample. In someembodiments, the RNA and DNA are cell-free polynucleotides. In someembodiments, the methods improve the sensitivity and/or base callingaccuracy of sequencing methodologies in the identification of mutations(e.g. rare sequence variants). In some embodiments, the methodcomprises: (a) obtaining a sample comprising both RNA and DNA; (b)reverse transcribing the RNA to produce cDNA/RNA hybrid molecules; (c)degrading the RNA of the hybrid molecules to produce single-strandedcDNA; (d) preferentially joining a tag oligonucleotide comprising a tagsequence to the single-stranded cDNA in a reaction comprising asingle-stranded DNA ligase to produce tagged cDNA; and (e) sequencingthe DNA and the tagged cDNA; wherein the reverse transcribing,preferentially joining, and sequencing are performed in the presence ofthe DNA. In some embodiments, the RNA and DNA are cell-free nucleicacids. Nucleic acids (including cell-free nucleic acids) can be isolatedfrom any of a variety of sources, such as blood, a blood fraction (e.g.serum or plasma), urine, and other bodily fluids. In some embodiments,the reverse transcribing comprises extension of primers comprising arandom sequence (e.g. one or more nucleotides selected at random from aset of two or more different nucleotides at one or more positions, witheach of the different nucleotides selected at one or more positionsrepresented in a pool of oligonucleotides comprising the randomsequence). In some embodiments, the reverse transcribing comprisesextension of the cDNA of the hybrid along a template-switcholigonucleotide (TSO), which may comprise a universal switch primersequence. In some embodiments, the tag oligonucleotide is joined to a 3′end of the single-stranded cDNA. In some embodiments, the tagoligonucleotide comprises a primer binding sequence. In someembodiments, the sequencing comprises amplifying the tagged cDNA toproduce double-stranded tagged cDNA. In some embodiments, amplifying thetagged cDNA comprises extending a primer hybridized to the primerbinding sequence. In some embodiments, the sequencing comprises joiningsequencing adapters to the tagged cDNA and the DNA. In some embodiments,the tag oligonucleotide comprises a unique molecular identifier (UMI),wherein each of a plurality of tagged cDNA molecules is distinguishablefrom others in the plurality of tagged cDNA molecules based on the UMI(e.g. as determined by the sequence of the UMI, optionally incombination with the sequence of the cDNA). In some embodiments, thesample is blood, a blood fraction, plasma, serum, saliva, sputum, urine,semen, transvaginal fluid, cerebrospinal fluid, or stool. In someembodiments, the sample is blood or a blood fraction (e.g. serum orplasma). In some embodiments, the method further comprises using aprocessor to group RNA-derived sequences separately from DNA-derivedsequences based on the presence or absence of the tag sequence, or acomplement of the tag sequence. In some embodiments, the method furthercomprises identifying presence or absence of a condition of a subject(e.g. cancer) based on the RNA-derived sequences and the DNA-derivedsequences. In some embodiments, the method further comprises treatingthe subject based on the RNA-derived sequences and the DNA-derivedsequences. In some embodiments, the method comprises: (a) obtaining asample comprising both RNA and DNA; (b) joining a tag oligonucleotidecomprising a tag sequence to the RNA in a reaction comprising an RNAligase to produce tagged RNA; (c) reverse transcribing the tagged RNA toproduce tagged cDNA; and (d) sequencing the DNA and the tagged cDNA;wherein the joining, reverse transcribing, and sequencing are performedin the presence of the DNA. In some embodiments, the RNA and DNA arecell-free nucleic acids. In some embodiments, the method furthercomprises fragmenting the RNA to produce fragmented RNA prior to joiningthe tag sequence. In some embodiments, the fragmented RNA have anaverage size within a pre-defined range (e.g. an average or medianlength from about 10 to about 1,000 nucleotides in length, such asbetween 10-800, 10-500, 50-500, 90-200, or 50-150 nucleotides; or anaverage or median length of less than 1500, 1000, 750, 500, 400, 300,250, or fewer nucleotides in length). In some embodiments, fragmentingthe RNA comprises subjecting the RNA and DNA to conditions thatpreferentially fragment the RNA. In some embodiments, fragmenting theRNA comprises sonication, chemical fragmentation, or heating. In someembodiments, the method further comprises dephosphorylating 3′ ends offragmented RNA. In some embodiments, the tag oligonucleotide is joinedto a 3′ end of the RNA. In some embodiments, the tag oligonucleotidecomprises a primer binding sequence. In some embodiments, the reversetranscribing comprises extending a primer hybridized to the primerbinding sequence. In some embodiments, the reverse transcribingcomprises extension of the tagged cDNA along a template-switcholigonucleotide (TSO), which may comprise a universal switch primersequence. In some embodiments, the sequencing comprises amplifying thetagged cDNA to produce double-stranded tagged cDNA. In some embodiments,the sequencing comprises joining sequencing adapters to the tagged cDNAand the DNA. In some embodiments, the tag oligonucleotide comprises aunique molecular identifier (UMI), wherein each of a plurality of taggedcDNA molecules is distinguishable from others in the plurality of taggedcDNA molecules based on the UMI (e.g. as determined by the sequence ofthe UMI, optionally in combination with the sequence of the cDNA). Insome embodiments, the sample is blood, a blood fraction, plasma, serum,saliva, sputum, urine, semen, transvaginal fluid, cerebrospinal fluid,or stool. In some embodiments, the sample is blood or a blood fraction(e.g. serum or plasma). In some embodiments, the reverse transcribingcomprises extension of primers comprising a random sequence. In someembodiments, the method further comprises using a processor to groupRNA-derived sequences separately from DNA-derived sequences based on thepresence or absence of the tag sequence, or a complement of the tagsequence. In some embodiments, the method further comprises identifyingpresence or absence of a condition of a subject (e.g. cancer) based onthe RNA-derived sequences and the DNA-derived sequences. In someembodiments, the method further comprises treating the subject based onthe RNA-derived sequences and the DNA-derived sequences.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/218512, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods for enriching a plurality of targetnucleic acids in a sample, the methods comprising providing anendonuclease system, wherein each of the plurality of target nucleicacids comprises a first variant and a second variant, wherein theendonuclease system comprises a plurality of clustered regularlyinterspaced short palindromic repeat (CRISPR) RNAs (crRNAs), orderivatives thereof, each crRNA comprising a targeting sequence, and aplurality of CRISPR-associated (Cas) proteins, or variants thereof, eachCas protein capable of binding to a protospacer adjacent motif (PAM)site on a target nucleic acid, wherein the first variant of each targetnucleic acid comprises a PAM site adjacent to a region complementary toa crRNA targeting sequence, and wherein the second variant does notcomprise the PAM site or does not comprise the region complementary tothe crRNA targeting sequence adjacent to the PAM site, and contactingthe sample with the endonuclease system, thereby depleting the firstvariant and enriching the second variant of each of the plurality oftarget nucleic acids in the sample. In some embodiments, the firstvariant of each target nucleic acid comprises a PAM site adjacent to aregion complementary to a crRNA targeting sequence, and the secondvariant does not comprise the PAM site. In some embodiments, the firstvariant of each target nucleic acid comprises a PAM site adjacent to aregion complementary to a crRNA targeting sequence, and the secondvariant does not comprise the region complementary to the crRNAtargeting sequence adjacent to the PAM site. In some embodiments, thefirst variant of each target nucleic acid comprises a PAM site adjacentto a region complementary to a crRNA targeting sequence, and the secondvariant does not comprise the region complementary to the crRNAtargeting sequence. In some embodiments, the methods comprise amplifyingthe enriched second variants of the plurality of target nucleic acids toproduce an enriched sequencing library. In some embodiments, the methodscomprise sequencing the enriched sequencing library to detect structuralrearrangements or mutations in the target nucleic acids in the sample.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/127741, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods and systems for high fidelity sequencingand identification of rare nucleic acid variants. The systems andmethods may be used to identify rare variants in cell-free nucleic acidsamples such as tumor specific mutations among a sample comprising anormal genomic nucleic acid majority. The systems and methods allow forthe confident identification of mutations occurring at frequencies below1:10,000 in a sample. Identification of such rare variants results fromoptimization of several steps in the sequencing process followed byanalysis of sequencing reads based on aligned read pairs referred to asensembles. The systems and methods may find applications outside of rarevariant identification such as sequencing optimization for a desiredlevel of performance or sensitivity. The methods include sequencingnucleic acid. Steps of the method may include obtaining sequencing readsof a nucleic acid, identifying an ensemble comprising two or moresequencing reads with shared start coordinates and read lengths,determining a number of sequenced molecules comprised by the ensemble,identifying a candidate variant in the ensemble, and determining alikelihood of the candidate variant being a true variant using alikelihood estimation model and the determined number of sequencedmolecules. In certain embodiments, the step of obtaining sequencingreads may further comprise preparing a sequencing library from thenucleic acid, amplifying the sequencing library, and sequencing thesequencing library using next generation sequencing (NGS). In certainembodiments, adapters may be ligated to the nucleic acid underconditions configured to allow adapter stacking. The preparation of thesequencing library may comprise ligating adapters to the nucleic acid ata temperature of about 16 degrees Celsius using a reaction time of about16 hours. The amplification step may comprise PCR amplification and themethods may further comprise selecting an over-amplification factor anda PCR cycle number required to detect variants at a specifiedconcentration in a sample using an in-silico model. In variousembodiments, the methods include designing a hybrid capture panel totarget a genomic region based on factors comprising, guanine-cytosine(GC) content, mutation frequency in a target population, and sequenceuniqueness and capturing the amplified nucleic acid using the hybridcapture panel before the sequencing step. The capturing step may includeusing a first hybrid capture panel targeting a sense strand of a targetloci and a second hybrid capture panel targeting an antisense strand ofthe target loci. In certain embodiments, a synthetic nucleic acidcontrol, also referred to as control sequence, control spike-in, orpositive control, may be added to the nucleic acid before amplificationof the sequencing library and error rate may then be determined usingsequencing reads of the synthetic nucleic acid control. The syntheticnucleic acid control may comprise a known sequence having low diversityacross a species from which the nucleic acid is derived and having aplurality of non-naturally occurring mismatches to the known sequenceand, in certain embodiments, the plurality of non-naturally occurringmismatches can be 4. The synthetic nucleic acid control may include aguanine-cytosine (GC) content distribution that is representative of thetarget loci of the hybrid capture panel or may include a plurality ofnucleic acids comprising varying overlaps with a pull down probe of thehybrid capture panel. Error rate or candidate variant frequency may bedetermined using sequencing reads of the synthetic nucleic acid control.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,902,992, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method for detecting copy number variation comprising: a) sequencingextracellular polynucleotides from a bodily sample from a subject,wherein each of the extracellular polynucleotide are optionally attachedto unique barcodes; b) filtering out reads that fail to meet a setthreshold; c) mapping sequence reads obtained from step (a) to areference sequence; d) quantifying/counting mapped reads in two or morepredefined regions of the reference sequence; e) determining a copynumber variation in one or more of the predefined regions by (i)normalizing the number of reads in the predefined regions to each otherand/or the number of unique barcodes in the predefined regions to eachother; and (ii) comparing the normalized numbers obtained in step (i) tonormalized numbers obtained from a control sample. Additionally oralternatively, detection of a genetic biomarker can include a method fordetecting a rare mutation in a cell-free or substantially cell freesample obtained from a subject comprising: a) sequencing extracellularpolynucleotides from a bodily sample from a subject, wherein each of theextracellular polynucleotide generate a plurality of sequencing reads;b) sequencing extracellular polynucleotides from a bodily sample from asubject, wherein each of the extracellular polynucleotide generate aplurality of sequencing reads; sequencing extracellular polynucleotidesfrom a bodily sample from a subject, wherein each of the extracellularpolynucleotide generate a plurality of sequencing reads; c) filteringout reads that fail to meet a set threshold; d) mapping sequence readsderived from the sequencing onto a reference sequence; e) identifying asubset of mapped sequence reads that align with a variant of thereference sequence at each mappable base position; f) for each mappablebase position, calculating a ratio of (a) a number of mapped sequencereads that include a variant as compared to the reference sequence, to(b) a number of total sequence reads for each mappable base position; g)normalizing the ratios or frequency of variance for each mappable baseposition and determining potential rare variant(s) or mutation(s); h)and comparing the resulting number for each of the regions withpotential rare variant(s) or mutation(s) to similarly derived numbersfrom a reference sample. Additionally or alternatively, detection of agenetic biomarker can include a method of characterizing theheterogeneity of an abnormal condition in a subject, the methodcomprising generating a genetic profile of extracellular polynucleotidesin the subject, wherein the genetic profile comprises a plurality ofdata resulting from copy number variation and/or other rare mutation(e.g., genetic alteration) analyses. In some embodiments, theprevalence/concentration of each rare variant identified in the subjectis reported and quantified simultaneously. In other embodiments, aconfidence score, regarding the prevalence/concentrations of rarevariants in the subject, is reported. In some embodiments, extracellularpolynucleotides comprise DNA. In other embodiments, extracellularpolynucleotides comprise RNA. Polynucleotides may be fragments orfragmented after isolation. Additionally or alternatively, detection ofa genetic biomarker can include a method for circulating nucleic acidisolation and extraction. In some embodiments, extracellularpolynucleotides are isolated from a bodily sample that may be selectedfrom a group consisting of blood, plasma, serum, urine, saliva, mucosalexcretions, sputum, stool and tears. In some embodiments, the methodsalso comprise a step of determining the percent of sequences having copynumber variation or other rare genetic alteration (e.g., sequencevariants) in said bodily sample. In some embodiments, the percent ofsequences having copy number variation in said bodily sample isdetermined by calculating the percentage of predefined regions with anamount of polynucleotides above or below a predetermined threshold.Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting a rare mutation in a cell-free or asubstantially cell free sample obtained from a subject comprising: a)sequencing extracellular polynucleotides from a bodily sample from asubject, wherein each of the extracellular polynucleotides generate aplurality of sequencing reads; b) filtering out reads that fail to meeta set threshold; c) mapping sequence reads derived from the sequencingonto a reference sequence; d) identifying a subset of mapped sequencereads that align with a variant of the reference sequence at eachmappable base position; e) for each mappable base position, calculatinga ratio of (a) a number of mapped sequence reads that include a variantas compared to the reference sequence, to (b) a number of total sequencereads for each mappable base position; f) normalizing the ratios orfrequency of variance for each mappable base position and determiningpotential rare variant(s) or other genetic alteration(s); and g)comparing the resulting number for each of the regions. Additionally oralternatively, detection of a genetic biomarker can include a methodcomprising: a. providing at least one set of tagged parentpolynucleotides, and for each set of tagged parent polynucleotides; b.amplifying the tagged parent polynucleotides in the set to produce acorresponding set of amplified progeny polynucleotides; c. sequencing asubset (including a proper subset) of the set of amplified progenypolynucleotides, to produce a set of sequencing reads; and d. collapsingthe set of sequencing reads to generate a set of consensus sequences,each consensus sequence corresponding to a unique polynucleotide amongthe set of tagged parent polynucleotides. In certain embodiments, themethod further comprises: e. analyzing the set of consensus sequencesfor each set of tagged parent molecules.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/064629, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a bait set panel comprising one or more bait setsthat selectively enrich for one or more nucleosome-associated regions ofa genome, said nucleosome-associated regions comprising genomic regionshaving one or more genomic base positions with differential nucleosomaloccupancy, wherein the differential nucleosomal occupancy ischaracteristic of a cell or a tissue type of origin or a disease state.In some embodiments, each of the one or more nucleosome-associatedregions of a bait set panel comprise at least one of: (i) significantstructural variation, comprising a variation in nucleosomal positioning,said structural variation selected from the group consisting of: aninsertion, a deletion, a translocation, a gene rearrangement,methylation status, a microsatellite, a copy number variation, a copynumber-related structural variation, or any other variation whichindicates differentiation; and (ii) instability, comprising one or moresignificant fluctuations or peaks in a genome partitioning mapindicating one or more locations of nucleosomal map disruptions in agenome. In some embodiments, the one or more bait sets of a bait setpanel are configured to capture nucleosome-associated regions of thegenome based on a function of a plurality of reference nucleosomaloccupancy profiles (i) associated with one or more disease states andone or more non-disease states; (ii) associated with a known somaticmutation, such as SNV, CNV, indel, or re-arrangement; and/or (iii)associated with differential expression patterns. In an embodiment, theone or more bait sets of a bait set panel selectively enrich for one ormore nucleosome-associated regions in a cell-free deoxyribonucleic acid(cfDNA) sample. Additionally or alternatively, detection of a geneticbiomarker can include a method for enriching a nucleic acid sample fornucleosome-associated regions of a genome comprising (a) bringing anucleic acid sample in contact with a bait set panel, said bait setpanel comprising one or more bait sets that selectively enrich for oneor more nucleosome-associated regions of a genome; and (b) enriching thenucleic acid sample for one or more nucleosome-associated regions of agenome. Additionally or alternatively, detection of a genetic biomarkercan include a method for generating a bait set comprising (a)identifying one or more regions of a genome, said regions associatedwith a nucleosome profile, and (b) selecting a bait set to selectivelycapture said regions. In an embodiment, a bait set in a bait set panelselectively enriches for one or more nucleosome-associated regions in acell-free deoxyribonucleic acid sample. Additionally or alternatively,detection of a genetic biomarker can include a method for enriching formultiple genomic regions comprising bringing a predetermined amount of anucleic acid sample in contact with a bait panel comprising (i) a firstbait set that selectively hybridizes to a first set of genomic regionsof the nucleic acid sample, provided at a first concentration ratio thatis less than a saturation point of the first bait set, and (ii) a secondbait set that selectively hybridizes to a second set of genomic regionsof the nucleic acid sample, provided at a second concentration ratiothat is associated with a saturation point of the second bait set; andenriching the nucleic acid sample for the first set of genomic regionsand the second set of genomic regions. Additionally or alternatively,detection of a genetic biomarker can include a method for improvingaccuracy of detecting an insertion or deletion (indel) from a pluralityof sequence reads derived from cell-free deoxyribonucleic acid (cfDNA)molecules in a bodily sample of a subject, which plurality of sequencereads are generated by nucleic acid sequencing, comprising (a) for eachof the plurality of sequence reads associated with the cell-free DNAmolecules, providing: a predetermined expectation of an indel beingdetected in one or more sequence reads of the plurality of sequencereads; a predetermined expectation that a detected indel is a true indelpresent in a given cell-free DNA molecule of the cell-free DNAmolecules, given that an indel has been detected in the one or more ofthe sequence reads; and a predetermined expectation that a detectedindel is introduced by non-biological error, given that an indel hasbeen detected in the one or more of the sequence reads; (b) providingquantitative measures of one or more model parameters characteristic ofsequence reads generated by nucleic acid sequencing; (c) detecting oneor more candidate indels in the plurality of sequence reads associatedwith the cell-free DNA molecules; and (d) for each candidate indel,performing a hypothesis test using one or more of the model parametersto classify said candidate indel as a true indel or an introduced indel,thereby improving accuracy of detecting an indel. Additionally oralternatively, detection of a genetic biomarker can include a kitcomprising (a) a sample comprising a predetermined amount of DNA; and(b) a bait set panel comprising (i) a first bait set that selectivelyhybridizes to a first set of genomic regions of a nucleic acid samplecomprising a predetermined amount of DNA, provided at a firstconcentration ratio that is less than a saturation point of the firstbait set and (ii) a second bait set that selectively hybridizes to asecond set of genomic regions of the nucleic acid sample, provided at asecond concentration ratio that is associated with a saturation point ofthe second bait set. Additionally or alternatively, detection of agenetic biomarker can include a method for enriching for multiplegenomic regions, comprising: (a) bringing a predetermined amount ofnucleic acid from a sample in contact with a bait mixture comprising (i)a first bait set that selectively hybridizes to a first set of genomicregions of the nucleic acid from the sample, which first bait set isprovided at a first concentration that is less than a saturation pointof the first bait set, and (ii) a second bait set that selectivelyhybridizes to a second set of genomic regions of the nucleic acidsample, which second bait set is provided at a second concentration thatis associated with a saturation point of the second bait set; and (b)enriching the nucleic acid sample for the first set of genomic regionsand the second set of genomic regions. Additionally or alternatively,detection of a genetic biomarker can include a method for enrichingmultiple genomic regions, comprising: (a) bringing a predeterminedamount of nucleic acid from a sample in contact with a bait mixturecomprising: (i) a first bait set that selectively hybridizes to a firstset of genomic regions of the nucleic acid from the sample, which firstbait set is provided at a first concentration that is less than asaturation point of the first bait set, and (ii) a second bait set thatselectively hybridizes to a second set of genomic regions of the nucleicacid from the sample, which second bait set is provided at a secondconcentration that is at or above a saturation point of the second baitset; and (b) enriching the nucleic acid from the sample for the firstset of genomic regions and the second set of genomic regions, therebyproducing an enriched nucleic acid.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,790,559, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method for detecting copy number variation comprising: a) sequencingextracellular polynucleotides from a bodily sample from a subject,wherein each of the extracellular polynucleotide are optionally attachedto unique barcodes; b) filtering out reads that fail to meet a setthreshold; c) mapping sequence reads obtained from step (a) to areference sequence; d) quantifying/counting mapped reads in two or morepredefined regions of the reference sequence; e) determining a copynumber variation in one or more of the predefined regions by (i)normalizing the number of reads in the predefined regions to each otherand/or the number of unique barcodes in the predefined regions to eachother; and (ii) comparing the normalized numbers obtained in step (i) tonormalized numbers obtained from a control sample. Additionally oralternatively, detection of a genetic biomarker can include a method fordetecting a rare mutation in a cell-free or substantially cell freesample obtained from a subject comprising: a) sequencing extracellularpolynucleotides from a bodily sample from a subject, wherein each of theextracellular polynucleotides generate a plurality of sequencing reads;b) sequencing extracellular polynucleotides from a bodily sample from asubject, wherein each of the extracellular polynucleotide generate aplurality of sequencing reads; sequencing extracellular polynucleotidesfrom a bodily sample from a subject, wherein each of the extracellularpolynucleotide generate a plurality of sequencing reads; c) filteringout reads that fail to meet a set threshold; d) mapping sequence readsderived from the sequencing onto a reference sequence; e) identifying asubset of mapped sequence reads that align with a variant of thereference sequence at each mappable base position; f) for each mappablebase portion, calculating a ratio of (a) a number of mapped sequencereads that include a variant as compared to the reference sequence, to(b) a number of total sequence reads for each mappable base position, g)normalizing the ratios or frequency of variance for each mappable baseposition and determining potential rare variant(s) or mutation(s); h)and comparing the resulting number for each of the regions withpotential rare variant(s) or mutation(s) to similarly derived numbersfrom a reference sample. Additionally or alternatively, detection of agenetic biomarker can include a method of characterizing theheterogeneity of an abnormal condition in a subject, the methodcomprising generating a genetic profile of extracellular polynucleotidesin the subject, wherein the genetic profile comprises a plurality ofdata resulting from copy number variation and/or other rare mutation(e.g., genetic alteration) analyses. Additionally or alternatively,detection of a genetic biomarker can include a system comprising acomputer readable medium for performing the following steps: selectingpredefined regions in a genome; enumerating number of sequence reads inthe predefined regions; normalizing the number of sequence reads acrossthe predefined regions, and determining percent of copy number variationin the predefined regions. In some embodiments, the entirety of thegenome or at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90% of thegenome is analyzed. In some embodiments, computer readable mediumprovides data in percent cancer DNA or RNA in plasma or serum to the enduser. Additionally or alternatively, detection of a genetic biomarkercan include a method for detecting a rare mutation in a cell-free or asubstantially cell free sample obtained from a subject comprising: a)sequencing extracellular polynucleotides from a bodily sample from asubject, wherein each of the extracellular polynucleotides generate aplurality of sequencing reads; b) filtering out reads that fail to meeta set threshold; c) mapping sequence reads derived from the sequencingonto a reference sequence; d) identifying a subset of mapped sequencereads that align with a variant of the reference sequence at eachmappable base position; e) for each mappable base position, calculatinga ratio of (a) a number of mapped sequence reads that include a variantas compared to the reference sequence, to (b) a number of total sequencereads tor each mappable base position; f) normalizing the ratios orfrequency of variance for each mappable base position and determiningpotential rare variant(s) or other genetic alteration(s); and g)comparing the resulting number for each of the regions. Additionally oralternatively, detection of a genetic biomarker can include a methodcomprising: a. providing at least one set of tagged parentpolynucleotides, and for each set of tagged parent polynucleotides; b.amplifying the tagged parent polynucleotides in the set to produce acorresponding set of amplified progeny polynucleotides; c. sequencing asubset (including a proper subset) of the set of amplified progenypolynucleotides, to produce a set of sequencing reads; and d. collapsingthe set of sequencing reads to generate a set of consensus sequences,each consensus sequence corresponding to a unique polynucleotide amongthe set of tagged parent polynucleotides. In certain embodiments, themethod further comprises; e. analyzing the set of consensus sequencesfor each set of tagged parent molecules. Additionally or alternatively,detection of a genetic biomarker can include a method comprising: a.providing at least one set of tagged parent polynucleotides, and foreach set of tagged parent polynucleotides; b. amplifying the taggedparent polynucleotides in the set to produce a corresponding set ofamplified progeny polynucleotides; c. sequencing a subset (including aproper subset) of the set of amplified progeny polynucleotides, toproduce a set of sequencing reads; d. collapsing the set of sequencingreads to generate a set of consensus sequences, each consensus sequencecorresponding to a unique polynucleotide among the set of tagged parentpolynucleotides; and e. filtering out from among the consensus sequencesthose that fail to meet a quality threshold. In one embodiment, thequality threshold considers a number of sequence reads from amplifiedprogeny polynucleotides collapsed into a consensus sequence. In anotherembodiment, the quality threshold considers a number of sequence readsfrom amplified progeny polynucleotides collapsed into a consensussequence. Additionally or alternatively, detection of a geneticbiomarker can include a method comprising: a. providing at least one setof tagged parent polynucleotides, wherein each set maps to a differentreference sequence in one or more genomes, and, for each set of taggedparent polynucleotides; i. amplifying the first polynucleotides toproduce a set of amplified polynucleotides; ii. sequencing a subset ofthe set of amplified polynucleotides, to produce a set of sequencingreads; and iii. collapsing the sequence reads by: 1. grouping sequencesreads sequenced from amplified progeny polynucleotides into families,each family amplified from the same tagged parent polynucleotide. In oneembodiment collapsing further comprises: 2. determining a quantitativemeasure of sequence reads in each family. In another embodiment themethod further comprises (including a) including a): b. determining aquantitative measure of unique families; and c. based on (1) thequantitative measure of unique families and (2) the quantitative measureof sequence read in each group, inferring a measure of unique taggedparent polynucleotides in the set. In another embodiment, inferring isperformed using statistical or probabilistic models. In anotherembodiment, the method further comprises using a control or set ofcontrol samples to correct for amplification or representation biasesbetween the two sets. In another embodiment, the method furthercomprises determining copy number variation between the sets. In anotherembodiment the method further comprises (including a, b, c): d.determining a quantitative measure of polymorphic forms among thefamilies; and e. based on the determined quantitative measure ofpolymorphic forms, inferring a quantitative measure of polymorphic formsin the number of inferred unique tagged parent polynucleotides. Inanother embodiment wherein polymorphic forms include but are not limitedto: substitutions, insertions, deletions, inversions, microsatellitechanges, transversions, translocations, fusions, methylation,hypermethylation, hydroxymethylation, acetylation, epigenetic variants,regulatory-associated variants or protein binding sites. In anotherembodiment wherein the sets derive from a common sample, the methodfurther comprising: a. inferring copy number variation for the pluralityof sets based on a comparison of the inferred number of tagged parentpolynucleotides in each set mapping to each of a plurality of referencesequences. In another embodiment, the original number of polynucleotidesin each set is further inferred. Additionally or alternatively,detection of a genetic biomarker can include a system comprising acomputer readable medium for performing the aforesaid methods.Additionally or alternatively, detection of a genetic biomarker caninclude a method of communicating sequence information about at leastone individual polynucleotide molecule comprising: a. providing at leastone individual polynucleotide molecule; b. encoding sequence informationin the at least one individual polynucleotide molecule to produce asignal; c. passing at least part of the signal through a channel toproduce a received signal comprising nucleotide sequence informationabout the at least one individual polynucleotide molecule, wherein thereceived signal comprises noise and/or distortion; d. decoding thereceived signal to produce a message comprising sequence informationabout the at least one individual polynucleotide molecule, whereindecoding reduces noise and/or distortion in the message; and e.providing the message to a recipient. In one embodiment, the noisecomprises incorrect nucleotide cells. In another embodiment, distortioncomprises uneven amplification of the individual polynucleotide moleculecompared with other individual polynucleotide molecules. In anotherembodiment distortion results from amplification or sequencing bias.Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting a rare mutation in a cell-free orsubstantially cell free sample obtained from a subject comprising: a)sequencing extracellular polynucleotides from a bodily sample from asubject, wherein each of the extracellular polynucleotide generate aplurality of sequencing reads; b) performing multiplex sequencing onregions or whole-genome sequencing if enrichment is not performed; c)filtering out reads that fail to meet a set threshold; d) mappingsequence reads derived from the sequencing onto a reference sequence; e)identifying a subset of mapped sequence reads that align with a variantof the reference sequence at each mappable base position; f) for eachmappable base position, calculating a ratio of (a) a number of mappedsequence reads that include a variant as compared to the referencesequence, to (b) a number of total sequence reads for each mappable baseposition; g) normalizing the ratios or frequency of variance for eachmappable base position and determining potential rare variant(s) ormutation(s); and h) and comparing the resulting number for each of theregions with potential rare variant(s) or mutation(s) to similarlyderived numbers from a reference sample. Additionally or alternatively,detection of a genetic biomarker can include a method of characterizingthe heterogeneity of an abnormal condition in a subject, the methodcomprising generating a genetic profile of extracellular polynucleotidesin the subject, wherein the genetic profile comprises a plurality dataresulting from copy number variation and rare mutation analyses.Additionally or alternatively, detection of a genetic biomarker caninclude a method comprising determining copy number variation orperforming rare mutation analysis in a cell-free or substantially cellfree sample obtained from a subject using multiplex sequencing.Additionally or alternatively, detection of a genetic biomarker caninclude a method comprising: a) providing at least one set of taggedparent polynucleotides, and for each set of tagged parentpolynucleotides; b) amplifying the tagged parent polynucleotides in theset to produce a corresponding set of amplified progeny polynucleotides;c) sequencing a subset (including a proper subset) of the set ofamplified progeny polynucleotides, to produce a set of sequencing reads;d) collapsing the set of sequencing reads to generate a set of consensussequences, each consensus sequence corresponding to a uniquepolynucleotide among the set of tagged parent polynucleotides; and e)filtering out from among the consensus sequences those that fail to meeta quality threshold.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,700,286, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea cancer detection assay in plasma/serum measuring by adding andcomparing the amount of DNA and RNA of certain genes in the plasma/serumof cancer patients that are the reflection of a gene amplification and agene over expression. Thus, gene amplification (seen by more DNA) andgene over expression (more RNA) are linked. Additionally oralternatively, detection of a genetic biomarker can include a method forthe diagnosis or the follow up of the evolution of cancers whichcomprises measuring together gene over expression (RNA) and geneamplification (DNA) in the bodily fluids of patients suspected to harborcancer on any gene that is both amplified and over expressed in cancercells and comparing to healthy controls. More particularly, RNA and DNAare extracted from a bodily fluid, such as plasma, serum, sputum,saliva, etc., purified and amplified, and the over expressed RNA andamplified DNA are analyzed and compared to a unique house keeping gene.In some embodiments, the nucleic acids are amplified by reversedtranscriptase chain reaction (RT-PCR) and are analyzed by gelcoloration, by radioactive immunological technique (MA), by enzymelinked immunosorbant test (ELISA) or by a microchip test (gene array),and possibly quantified by any method for nucleic acid quantification.In some embodiments, the quantification of RNA and DNA is carried out byreal time PCR, such as “TAQMAN™”, or on capillaries “LIGHTCYCLER™”, orreal time PCR and RT PCR of any company. In some embodiments, the genesanalyzed may be compared to a reference nucleic acid extract (DNA andRNA) corresponding to the expression (RNA) and quantity (DNA) of aunique house keeping gene, or to a reference RNA corresponding to theexpression of a house keeping coding gene, or to a reference DNAcorresponding to a unique gene, or may be estimated in reference to astandard curve obtained with nucleic acids of a cell line.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0195131, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods to detect fusion genes, which maybe used to detect a disease, such as cancer. Additionally oralternatively, detection of a genetic biomarker can include methods forenrichment of breakpoint fragments, such as to detect and characterizefusion genes, which may be associated with a disease, such as cancer.Additionally or alternatively, detection of a genetic biomarker caninclude a method for providing a diagnostic or therapeutic interventionto a subject having or suspected of having cancer, comprising (a)providing a biological sample comprising cell-free nucleic acidmolecules from a subject; (b) contacting the cell-free nucleic acidmolecules from the biological sample with a probe set underhybridization conditions sufficient to produce probe-capturedpolynucleotides, which probe set comprises a plurality of polynucleotideprobes, wherein each of the plurality of polynucleotide probes has (i)sequence complementarity with a fusion gene and (ii) affinity for thefusion gene that is greater than a polynucleotide having sequencecomplementary with the fusion gene and containing only unmodifiednucleotides; (c) isolating the probe-captured polynucleotides from themixture, to produce a sample enriched with isolated polynucleotidescomprising breakpoint fragments of the fusion gene; (d) sequencing theisolated polynucleotides to produce sequences; (e) detectingpolynucleotides comprising breakpoints of fusion genes based on thesequences; and (f) providing the diagnostic or therapeutic interventionbased on the detection of breakpoint fragments. Additionally oralternatively, detection of a genetic biomarker can include a method forcapturing a breakpoint fragment of a fusion gene, comprising (a)providing a biological sample containing or suspected of containing acell-free nucleic acid molecule comprising the breakpoint fragment ofthe fusion gene; and (b) contacting the biological sample with apolynucleotide probe under conditions sufficient to (i) permithybridization between the polynucleotide probe and the breakpointfragment to provide a probe-captured polynucleotide in a mixture, whichpolynucleotide probe has sequence complementarity with the breakpointfragment and has affinity for the fusion gene that is greater than apolynucleotide having sequence complementary with the fusion gene andcontaining only unmodified nucleotides; and (ii) enrichment or isolationof the probe-captured polynucleotide from the mixture, wherein thepolynucleotide probe has sequence complementarity with the breakpointfragment. Additionally or alternatively, detection of a geneticbiomarker can include a probe set comprising a plurality ofpolynucleotide probes, wherein each of the polynucleotide probes has (i)sequence complementarity with a fusion gene as part of a cell-freenucleic acid molecule and (ii) affinity for the fusion gene that isgreater than a polynucleotide having sequence complementary with thefusion gene and containing only unmodified nucleotides. Additionally oralternatively, detection of a genetic biomarker can includes a highaffinity polynucleotide, comprising a sequence that is configured tospecifically hybridize to a nucleic acid sequence associated with afusion gene in a cell-free nucleic acid molecule. Additionally oralternatively, detection of a genetic biomarker can include a highaffinity polynucleotide configured to specifically hybridize to a fusiongene. In one embodiment, the high affinity polynucleotide comprises oneor more locked nucleic acid nucleotides. In another embodiment the highaffinity polynucleotide has a melting temperature that is at least anyof 1° C., 2° C., 3° C., 4° C., 5° C., 10° C., 15° C. or 20° C. higherthan a polynucleotide with the same sequence comprising only naturalnucleotides. In another embodiment, the high affinity polynucleotide hasa melting temperature that is at least any of 2%, 4%, 6%, 8%, or 10%higher than a polynucleotide with the same sequence comprising onlynatural nucleotides. In another embodiment, the high affinitypolynucleotide is configured to specifically hybridize to a cancerfusion gene. Additionally or alternatively, detection of a geneticbiomarker can include a high affinity polynucleotide probe comprising ahigh affinity polynucleotide configured to specifically hybridize to afusion gene. In one embodiment, the high affinity polynucleotidecomprises one or more locked nucleic acid nucleotides. In anotherembodiment, the probe comprises a functionality selected from adetectable label, a binding moiety or a solid support. In anotherembodiment, the probe is configured to hybridize to a breakpointfragment of a fusion gene. In another embodiment, the breakpointfragment has a length between about 140 nucleotides and about 180nucleotides. In another embodiment the fragment is cell-freedeoxyribonucleic acid (DNA) or genomic DNA. In another embodiment, thehigh affinity polynucleotide is bound to a solid support. Additionallyor alternatively, detection of a genetic biomarker can include a methodfor capturing a breakpoint fragment of a fusion gene comprisingcontacting the breakpoint fragment with a high affinity polynucleotideprobe under stringent hybridization conditions and allowinghybridization, wherein the polynucleotide probe is bound to a solidsupport and wherein the polynucleotide probe has a nucleotide sequencethat is substantially or perfectly complementary to a nucleotidesequence of the breakpoint fragment. In one embodiment, the highaffinity polynucleotide comprises one or more locked nucleic acidnucleotides. Additionally or alternatively, detection of a geneticbiomarker can include a method for enriching a sample forpolynucleotides comprising a breakpoint of a fusion gene, comprising: a)contacting a probe set of claim 20 with a mixture of polynucleotidesunder hybridization conditions to produce probe-capturedpolynucleotides; and b) isolating the probe-captured polynucleotidesfrom the mixture, to produce a sample enriched with polynucleotidescomprising breakpoint fragments of the fusion gene. In one embodiment,the high affinity polynucleotide comprises one or more locked nucleicacid nucleotides. In another embodiment, the polynucleotides comprisecell-free DNA or fragmented genomic DNA. In another embodiment, themethod further comprises isolating captured polynucleotides from theprobes. In another embodiment, the method further comprises sequencingthe isolated polynucleotides. Additionally or alternatively, detectionof a genetic biomarker can include a method of diagnosing cancer in asubject comprising: a) providing a sample comprising polynucleotidesfrom a subject; b) contacting the cell-free DNA (cfDNA) from the samplewith a probe set of claim 20 under hybridization conditions to produceprobe-captured polynucleotides; c) isolating the probe-capturedpolynucleotides from the mixture, to produce a sample enriched withpolynucleotides comprising breakpoint fragments of the fusion gene; d)sequencing the isolated polynucleotides to produce sequences; e)detecting polynucleotides comprising breakpoints of fusion genes basedon the sequences; and f) diagnosing cancer based on the detection ofbreakpoint fragments. In one embodiment, the high affinitypolynucleotide comprises one or more locked nucleic acid nucleotides.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0120291, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method for analyzing a disease state ofa subject, comprising (a) using a genetic analyzer to generate geneticdata from nucleic acid molecules in biological samples of the subjectobtained at (i) two or more time points or (ii) substantially the sametime point, wherein the genetic data relates to genetic information ofthe subject, and wherein the biological samples include a cell-freebiological sample; (b) receiving the genetic data from the geneticanalyzer; (c) with one or more programmed computer processors, using thegenetic data to produce an adjusted test result in a characterization ofthe genetic information of the subject; and (d) outputting the adjustedtest result into computer memory. In some embodiments, the genetic datacomprises current sequence reads and prior sequence reads, and wherein(c) comprises comparing the current sequence reads with the priorsequence reads and updating a diagnostic confidence indicationaccordingly with respect to the characterization of the geneticinformation of the subject, which diagnostic confidence indication isindicative of a probability of identifying one or more geneticvariations in a biological sample of the subject. In some embodiments,the method further comprises obtaining a subsequent characterization andleaving as is a diagnostic confidence indication in the subsequentcharacterization for de novo information. In some embodiments, themethod further comprises determining a frequency of one or more geneticvariants detected in a collection of sequence reads included in thegenetic data and producing the adjusted test result at least in part bycomparing the frequency of the one or more genetic variants at the twoor more time points. In some embodiments, the method further comprisesdetermining an amount of copy number variation at one or more geneticloci detected in a collection of sequence reads included in the geneticdata and producing the adjusted test result at least in part bycomparing the amount at the two or more time points. In someembodiments, the method further comprises using the adjusted test resultto provide (i) a therapeutic intervention or (ii) a diagnosis of ahealth or disease to the subject. In some embodiments, the genetic datacomprises a first set of genetic data and a second set of genetic data,wherein the first set of genetic data is at or below a detectionthreshold and the second set of genetic data is above the detectionthreshold. In some embodiments, the detection threshold is a noisethreshold. In some embodiments, the method further comprises, in (c),adjusting a diagnosis of the subject from negative or uncertain topositive when the same genetic variants are detected in the first set ofgenetic data and the second set of genetic data in a plurality ofsampling instances or time points. In some embodiments, the methodfurther comprises, in (c), adjusting a diagnosis of the subject fromnegative or uncertain to positive in a characterization from an earliertime point when the same genetic variants are detected in the first setof genetic data at an earlier time point and in the second set ofgenetic data at a later time point. Additionally or alternatively,detection of a genetic biomarker can include a method of detecting atrend in the amount of cancer polynucleotides in a biological samplefrom a subject over time, comprising determining, using or moreprogrammed computer processors, a frequency of the cancerpolynucleotides at each of a plurality of time points; determining anerror range for the frequency at each of the plurality of time points toprovide at least a first error range at a first time point and a seconderror range at a second time point subsequent to the first time point;and determining whether (1) the first error range overlaps with thesecond error range, which overlap is indicative of stability offrequency of the cancer polynucleotides at a plurality of time points,(2) the second error range is greater than the first error range,thereby indicating an increase in frequency of the cancerpolynucleotides at a plurality of time points, or (3) the second errorrange is less than the first error range, thereby indicating a decreasein frequency of the cancer polynucleotides at a plurality of timepoints. Additionally or alternatively, detection of a genetic biomarkercan include a method to detect one or more genetic variations and/oramount of genetic variation in a subject, comprising sequencing nucleicacid molecules in a cell-free nucleic acid sample of the subject with agenetic analyzer to generate a first set of sequence reads at a firsttime point; comparing the first set of sequence reads with at least asecond set of sequence reads obtained at least at a second time pointbefore the first time point to yield a comparison of first set ofsequence reads and the at least the second set of sequence reads; usingthe comparison to update a diagnostic confidence indication accordingly,which diagnostic confidence indication is indicative of a probability ofidentifying one or more genetic variations in a cell-free nucleic acidsample of the subject; and detecting a presence or absence of the one ormore genetic variations and/or amount of genetic variation in nucleicacid molecules in a cell-free nucleic acid sample of the subject basedon the diagnostic confidence indication. In some embodiments, the methodfurther comprises obtaining the cell-free nucleic acid Additionally oralternatively, detection of a genetic biomarker can include a method fordetecting a mutation in a cell-free nucleic acid sample of a subject,comprising: (a) determining consensus sequences by comparing currentsequence reads obtained from a genetic analyzer with prior sequencereads from a prior time period to yield a comparison, and updating adiagnostic confidence indication based on the comparison, wherein eachconsensus sequence corresponds to a unique polynucleotide among a set oftagged parent polynucleotides derived from the cell-free nucleic acidsample, and (b) based on the diagnostic confidence, generating a geneticprofile of extracellular polynucleotides in the subject, wherein thegenetic profile comprises data resulting from copy number variation ormutation analyses. Additionally or alternatively, detection of a geneticbiomarker can include a method to detect abnormal cellular activity,comprising: providing at least one set of tagged parent polynucleotidesderived from a biological sample of a subject; amplifying the taggedparent polynucleotides in the set to produce a corresponding set ofamplified progeny polynucleotides; using a genetic analyzer to sequencea subset of the set of amplified progeny polynucleotides to produce aset of sequencing reads; and collapsing the set of sequencing reads togenerate a set of consensus sequences by comparing current sequencereads with prior sequence reads from at least one prior time period andupdating a diagnostic confidence indication accordingly, whichdiagnostic confidence indication is indicative of a probability ofidentifying one or more genetic variations in a biological sample of thesubject, wherein each consensus sequence corresponds to a uniquepolynucleotide among the set of tagged parent polynucleotides.Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting a mutation in a cell-free orsubstantially cell free sample of a subject comprising: (a) sequencingextracellular polynucleotides from a bodily sample of the subject with agenetic analyzer; (b) for each of the extracellular polynucleotides,generating a plurality of sequencing reads; (c) filtering out reads thatfail to meet a set threshold; (d) mapping sequence reads derived fromthe sequencing onto a reference sequence; (e) identifying a subset ofmapped sequence reads that align with a variant of the referencesequence at each mappable base position; (f) for each mappable baseposition, calculating a ratio of (i) a number of mapped sequence readsthat include a variant as compared to the reference sequence, to (ii) anumber of total sequence reads for each mappable base position; and (g)using one or more programmed computer processors to compare the sequencereads with other sequence reads from at least one previous time pointand updating a diagnostic confidence indication accordingly, whichdiagnostic confidence indication is indicative of a probability ofidentifying the variant. Additionally or alternatively, detection of agenetic biomarker can include a method for operating a genetic testequipment, comprising: providing initial starting genetic materialobtained from a bodily sample obtained from a subject; converting doublestranded polynucleotide molecules from the initial starting geneticmaterial into at least one set of non-uniquely tagged parentpolynucleotides, wherein each polynucleotide in a set is mappable to areference sequence; and for each set of tagged parent polynucleotides:(i) amplifying the tagged parent polynucleotides in the set to produce acorresponding set of amplified progeny polynucleotides; (ii) sequencingthe set of amplified progeny polynucleotides to produce a set ofsequencing reads; (iii) collapsing the set of sequencing reads togenerate a set of consensus sequences, wherein collapsing uses sequenceinformation from a tag and at least one of: (1) sequence information ata beginning region of a sequence read, (2) an end region of the sequenceread and (3) length of the sequence read, wherein each consensussequence of the set of consensus sequences corresponds to apolynucleotide molecule among the set of tagged parent polynucleotides;and (iv) analyzing the set of consensus sequences for each set of taggedparent molecules; (v) comparing current sequence reads with priorsequence reads from at least one other time point; and (vi) updating adiagnostic confidence indication accordingly, which diagnosticconfidence indication is indicative of a probability of identifying oneor more genetic variations in a bodily sample of the subject.Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting one or more genetic variants in asubject, comprising: (a) obtaining nucleic acid molecules from one ormore cell-free biological samples of said subject; (b) assaying saidnucleic acid molecules to produce a first set of genetic data and asecond set of genetic data, wherein said first set of genetic dataand/or said second set of genetic data is within a detection threshold;(c) comparing said first set of genetic data to said second set ofgenetic data to identify said one or more genetic variants in said firstset of genetic data or said second set of genetic data; and (d) based onsaid one or more genetic variants identified in (c), using one or moreprogrammed computer processors to update a diagnostic confidenceindication for identifying said one or more genetic variants in acell-free biological sample of said subject. Additionally oralternatively, detection of a genetic biomarker can include a method forcalling a genetic variant in cell-free deoxyribose nucleic acids (cfDNA)from a subject comprising: (a) using a DNA sequencing system to sequencecfDNA from a sample taken at a first time point from a subject; (b)detecting a genetic variant in the sequenced cfDNA from the first timepoint, wherein the genetic variant is detected at a level below adiagnostic limit; (c) using the DNA sequencing system to sequence cfDNAfrom a sample taken from the subject at one or more subsequent timepoints; (d) detecting the genetic variant in the sequenced cfDNA fromthe one or more subsequent time points, wherein the genetic variant isdetected at level below the diagnostic limit; (e) calling the samples aspositive for the genetic variant based on detecting the genetic variantbelow the diagnostic limit in samples taken at a plurality of the timepoints. Additionally or alternatively, detection of a genetic biomarkercan include a method for calling a genetic variant in cell-freedeoxyribose nucleic acids (cfDNA) from a subject comprising: (a) using adeoxyribonucleic acid (DNA) sequencing system to sequence cfDNA from asample from a subject; (b) detecting a genetic variant in the sequencedcfDNA, wherein the genetic variant is detected at a level below adiagnostic limit; (c) using the DNA sequencing system to sequence cfDNAfrom the sample taken from the subject, wherein the sample isre-sequenced one or more times; (d) detecting the genetic variant in thesequenced cfDNA from the one or more re-sequenced samples, wherein thegenetic variant is detected at level below the diagnostic limit; and (e)calling the samples as positive for the genetic variant based ondetecting the genetic variant below the diagnostic limit in re-sequencedsamples.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0240972, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include systems and methods for determining genefusion by determining a fused read containing sequencing data of atleast a portion of a fused chromosome DNA molecule; determining apredetermined point on the genome with least one mapped portion of thefused read clipped at the predetermined point (a breakpoint);identifying two mapped read portions from two breakpoints (breakpointpair) as a potential fusion candidate; creating one or more fusion setsbased on breakpoint pairs and clustering the fusion sets into one ormore fusion clusters; and identifying each fusion cluster meeting apredetermined criterion as a gene fusion. Additionally or alternatively,detection of a genetic biomarker can include a method for processinggenetic sequence read data from a sample, the method comprising:determining a fused read containing sequencing data of at least aportion of a fused chromosome DNA molecule; determining a predeterminedpoint on the genome with least one mapped portion of the fused readclipped at the predetermined point (a breakpoint); identifying twomapped read portions from two breakpoints (breakpoint pair) as apotential fusion candidate; creating one or more fusion sets based onbreakpoint pairs and clustering the fusion sets into one or more fusionclusters; and identifying each fusion cluster meeting a predeterminedcriterion as a gene fusion. In some embodiments, the method comprisesassigning a unique molecule or read identifier (read ID) to each read.In some embodiments, the method comprises clipping each mapped portionof the reads from one or both sides. In some embodiments, thebreakpoints are independent of the reads in identity and are identifiedby a sign, a chromosome and a position. In some embodiments, thebreakpoints keep statistics including a number of reads and moleculesthat are clipped or split at the breakpoint, and a number of wild-typereads and molecules that pass over the breakpoint. In some embodiments,the method comprises selecting every two mapped read portions withcommon read IDs that belong to two breakpoints with appropriate signs asa potential fusion candidate. In some embodiments, the potential fusioncandidate location in the original read before mapping shows the readportion as originally located next to each other. In some embodiments,the method comprises checking if read portions are mapped on one strandfor differences in the breakpoints' signs. In some embodiments, themethod comprises tracking fusion set statistics Additionally oralternatively, detection of a genetic biomarker can include a system toanalyze genetic information, comprising a DNA sequencer; a processorcoupled to the DNA sequencer, the processor running computer code toprocess genetic sequence read data from a sample, the computer codecomprising instructions for: determining a fused read containingsequencing data of a portion of a fused chromosome DNA molecule;determining at least a predetermined point on the genome with least onemapped portion of the fused read clipped at the predetermined point (abreakpoint); identifying two mapped read portions from two breakpoints(breakpoint pair) as a potential fusion candidate; creating one or morefusion sets based on breakpoint pairs and clustering the fusion setsinto one or more fusion clusters; and identifying each fusion clustermeeting a predetermined criterion as a gene fusion. Additionally oralternatively, detection of a genetic biomarker can include a methodcomprising: sequencing DNA molecules with a DNA sequencer to generate acollection of sequences; mapping the collection of sequences to areference genome; identifying fused reads from the mapped collection,wherein a fused read contains sub-sequences, wherein a firstsub-sequence maps to a first genetic locus and a second sub-sequencemaps to a second, distinct genetic locus; for each fused read,identifying a first breakpoint at the first genetic locus and a secondbreakpoint at the second genetic locus, wherein a breakpoint is a pointon the reference genome where a sequence of a fused read is clipped, andwherein the first and second breakpoints form a breakpoint pair;generating sets of fused reads, each set comprising fused reads havingthe same breakpoint pair; clustering sets of fused reads, wherein eachcluster is formed from sets of fused reads having first breakpointswithin a first predetermined nucleotide distance and second breakpointswithin a second predetermined nucleotide distance; and determining agene fusion for one or more clusters, wherein a gene fusion for acluster has, as a first fusion gene breakpoint, a breakpoint selectedfrom the first breakpoints in the cluster and, as a second fusion genebreakpoint, a breakpoint selected from the second breakpoints in thecluster, and wherein the first and second fusion gene breakpoints areeach selected based on selection criteria. Additionally oralternatively, detection of a genetic biomarker can include a methodcomprising: sequencing a plurality of DNA molecules with a DNAsequencer; tagging each of the plurality of sequences molecules with anidentifier; mapping each tagged sequence to a reference genome;identifying clipped reads from the mapped tagged sequences, wherein aclipped read is a tagged sequence containing a mapped portion and aclipped portion, wherein the mapped portion maps to a genetic locus andthe clipped portion does not map to the genetic locus; determining abreakpoint of each clipped read, wherein a breakpoint is a point on thereference genome where a sequence of a clipped read is clipped; creatingbreakpoint sets, each breakpoint set comprising identifiers of clippedreads having the same breakpoint; creating sets of breakpoint pairs bycomparing pairs of breakpoint sets, each set of breakpoint pairsincluding identifiers present in both members of a compared pair ofbreakpoint sets; clustering sets of breakpoint pairs, wherein eachcluster includes sets of breakpoint pairs having a first breakpoint ofthe pair within a first predetermined genetic distance and a secondbreakpoint of the pair within a second predetermined genetic distance;and determining a gene fusion for one or more of the clusters, wherein agene fusion for a cluster has, as a first fusion gene breakpoint, abreakpoint selected from the first breakpoints in the cluster and, as asecond fusion gene breakpoint, a breakpoint selected from the secondbreakpoints in the cluster, and wherein the first and second fusion genebreakpoints are each selected based on a selection criteria. In someembodiments, the selection criteria include the breakpoint having themost fused reads in the cluster. Additionally or alternatively,detection of a genetic biomarker can include a method for identifying afusion gene breakpoint, the method comprising: determining a fused readcontaining sequencing data of at least a portion of a fused chromosomeDNA molecule; determining a predetermined point on the genome with leastone mapped portion of the fused read clipped at the predetermined point(a breakpoint); identifying two mapped read portions from twobreakpoints (breakpoint pair) as a potential fusion candidate; creatingone or more fusion sets based on breakpoint pairs and clustering thefusion sets into one or more fusion clusters; identifying each fusioncluster meeting a predetermined criterion as a gene fusion, andidentifying a breakpoint of the gene fusion as the fusion genebreakpoint. Additionally or alternatively, detection of a geneticbiomarker can include a method for diagnosing a condition in a subject,the method comprising: determining a fused read containing sequencingdata of at least a portion of a fused chromosome DNA molecule;determining a predetermined point on the genome with least one mappedportion of the fused read clipped at the predetermined point (abreakpoint); identifying two mapped read portions from two breakpoints(breakpoint pair) as a potential fusion candidate; creating one or morefusion sets based on breakpoint pairs and clustering the fusion setsinto one or more fusion clusters; and identifying each fusion clustermeeting a predetermined criterion as a gene fusion, wherein said genefusion is indicative of the condition.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0240973, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method comprising: (a) obtainingsequencing reads of deoxyribonucleic acid (DNA) molecules of a cell-freebodily fluid sample of a subject; (b) generating from the sequence readsa first data set comprising for each genetic locus in a plurality ofgenetic loci a quantitative measure related to sequencing read coverage(“read coverage”); (c) correcting the first data set by performingsaturation equilibrium correction and probe efficiency correction; (d)determining a baseline read coverage for the first data set, wherein thebaseline read coverage relates to saturation equilibrium and probeefficiency; and (e) determining a copy number state for each geneticlocus in the plurality of genetic loci relative to the baseline readcoverage. In some embodiments, the first data set comprises, for eachgenetic locus in a plurality of genetic loci, a quantitative measurerelated to (i) guanine-cytosine content (“GC content”) of the geneticlocus. In some embodiments, the method comprises, prior to (c), removingfrom the first data set genetic loci that are high-variance geneticloci, wherein removing comprises: (i) fitting a model relating thequantitative measures related to guanine-cytosine content and thequantitative measures of sequencing read coverage of the genetic loci;and (ii) removing from the genetic loci at least 10% of the geneticloci, wherein the removing the genetic loci comprises removing geneticloci that most differ from the model, thereby providing the first dataset of baselining genetic loci. In some embodiments, the methodcomprises removing at least 45% of the genetic loci. In someembodiments, determining a copy number state comprises comparing theread coverage of the genetic loci to the baseline read coverage. In someembodiments, the cell-free bodily fluid is selected from the groupconsisting of serum, plasma, urine, and cerebrospinal fluid. In someembodiments, the read coverage is determined by mapping the sequencingreads to a reference genome. In some embodiments, obtaining thesequencing reads comprises ligating adaptors to the DNA molecules fromthe cell-free bodily fluid from the subject. In some embodiments, theDNA molecules are duplex DNA molecules and the adaptors are ligated tothe duplex DNA molecules such that each adaptor differently tagscomplementary strands of the DNA molecule to provide tagged strands. Insome embodiments, determining the quantitative measure related to theprobability that a strand of DNA derived from the genetic locus isrepresented within the sequencing reads comprises sorting sequencingreads into paired reads and unpaired reads, wherein (i) each paired readcorresponds to sequence reads generated from a first tagged strand and asecond differently tagged complementary strand derived from adouble-stranded polynucleotide molecule in said set, and (ii) eachunpaired read represents a first tagged strand having no seconddifferently tagged complementary strand derived from a double-strandedpolynucleotide molecule represented among said sequence reads in saidset of sequence reads. In some embodiments, the method further comprisesdetermining quantitative measures of (i) said paired reads and (ii) saidunpaired reads that map to each of one or more genetic loci to determinea quantitative measure related to total double-stranded DNA molecules insaid sample that map to each of said one or more genetic loci based onsaid quantitative measure related to paired reads and unpaired readsmapping to each locus. In some embodiments, the adaptors comprisebarcode sequences. In some embodiments, determining the read coveragecomprises collapsing the sequencing reads based on position of themapping of the sequencing reads to the reference genome and the barcodesequences. In some embodiments, the genetic loci comprise one or moreoncogenes. In some embodiments, a method comprises determining that atleast a subset of the baselining genetic loci has undergone copy numberalteration in the tumor cells of the subject by determining relativequantities of variants within the baselining genetic loci for which thegermline genome of the subject is heterozygous. In some embodiments, therelative quantities of the variants are not approximately equal. In someembodiments, baselining genetic loci for which the relative quantitiesof the variants are not approximately equal are removed from thebaselining genetic loci, thereby providing allelic-frequency correctedbaselining genetic loci. In some embodiments, the allelic-frequencycorrected baselining genetic loci are used as the baselining loci in themethods of any one of the preceding claims. Additionally oralternatively, detection of a genetic biomarker can include a methodcomprising: receiving into memory sequencing reads of deoxyribonucleicacid (DNA) molecules of a cell-free bodily fluid sample of a subject;executing code with a computer processor to perform the following steps:generating from the sequence reads a first data set comprising for eachgenetic locus in a plurality of genetic loci a quantitative measurerelated to sequencing read coverage (“read coverage”); correcting thefirst data set by performing saturation equilibrium correction and probeefficiency correction; determining a baseline read coverage for thefirst data set, wherein the baseline read coverage relates to saturationequilibrium and probe efficiency; and determining a copy number statefor each genetic locus in the plurality of genetic loci relative to thebaseline read coverage. Additionally or alternatively, detection of agenetic biomarker can include a system comprising: a network; a databasecomprising computer memory configured to store nucleic acid (e.g., DNA)sequence data which are connected to the network; a bioinformaticscomputer comprising a computer memory and one or more computerprocessors, which computer is connected to the network; wherein thecomputer further comprises machine-executable code which, when executedby the one or more computer processors, copies nucleic acid (e.g., DNA)sequence data stored on the database, writes the copied data to memoryin the bioinformatics computer and performs steps including: generatingfrom the nucleic acid (e.g., DNA) sequence data a first data setcomprising for each genetic locus in a plurality of genetic loci aquantitative measure related to sequencing read coverage (“readcoverage”); correcting the first data set by performing saturationequilibrium correction and probe efficiency correction; determining abaseline read coverage for the first data set, wherein the baseline readcoverage relates to saturation equilibrium and probe efficiency; anddetermining a copy number state for each genetic locus in the pluralityof genetic loci relative to the baseline read coverage. In someembodiments, the database is connected to a DNA sequencer.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0260590, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method comprising: (a) sequencingpolynucleotides from cancer cells from a biological sample of a subject;(b) identifying and quantifying somatic mutations in thepolynucleotides; (c) developing a profile of tumor heterogeneity in thesubject indicating the presence and relative quantity of a plurality ofthe somatic mutations in the polynucleotides, wherein different relativequantities indicates tumor heterogeneity; and (d) determining atherapeutic intervention for a cancer exhibiting the tumorheterogeneity, wherein the therapeutic intervention is effective againsta cancer having the profile of tumor heterogeneity determined. In someembodiments, the cancer cells are spatially distinct. In someembodiments, the therapeutic intervention is more effective against acancer presenting with the plurality of somatic mutations than it isagainst a cancer presenting with any one, but not all, of the somaticmutations. In some embodiments, the method further comprises: (e)monitoring changes in tumor heterogeneity in the subject over time anddetermining different therapeutic interventions over time based on thechanges. In some embodiments, the method further comprises: (e)displaying the therapeutic intervention. In some embodiments, the methodfurther comprises: (e) implementing the therapeutic intervention. Insome embodiments, the method further comprises: (e) generating aphylogeny of tumor evolution based on the tumor profile; whereindetermining the therapeutic intervention takes into account thephylogeny. In some embodiments, determining is performed with the aid ofcomputer-executed algorithm. In some embodiments, sequence readsgenerated by sequencing are subject to noise reduction beforeidentifying and quantifying. In some embodiments, noise reductioncomprises molecular tracking of sequences generated from a singlepolynucleotide in the sample. In some embodiments, determining atherapeutic intervention takes into account the relative frequencies ofthe tumor-related genetic alterations. In some embodiments, thetherapeutic intervention comprises administering, in combination or inseries, a plurality of drugs, wherein each drug is relatively moreeffective against a cancer presenting with a different one of somaticmutations that occur at different relative frequency. In someembodiments, a drug that is relatively more effective against a cancerpresenting with a somatic mutation occurring at higher relativefrequency is administered in higher amount. In some embodiments, thedrugs are delivered at doses that are stratified to reflect the relativeamounts of the variants in the DNA. In some embodiments, cancerspresenting with at least one of the genetic variants is resistant to atleast one of the drugs. In some embodiments, determining a therapeuticintervention takes into account the tissue of origin of the cancer. Insome embodiments, the therapeutic intervention is determined based on adatabase of interventions shown to be therapeutic for cancers havingtumor heterogeneity characterized by each of the somatic mutations.Additionally or alternatively, detection of a genetic biomarker caninclude a method comprising providing a therapeutic intervention for asubject having a cancer having a tumor profile from which tumorheterogeneity can be inferred, wherein the therapeutic intervention iseffective against cancers with the tumor profile. In some embodiments,the tumor profile indicates relative frequency of a plurality of moresomatic mutations. In some embodiments, the method further comprisesmonitoring changes in the relative frequencies in the subject over timeand determining different therapeutic interventions over time based onthe changes. In some embodiments, the therapeutic intervention is moreeffective against a cancer presenting with each of the somatic mutationsthan it is against a cancer presenting with any one, but not all, of thesomatic mutations. In some embodiments, the therapeutic interventioncomprises administering, in combination or in series, a plurality ofdrugs, wherein each drug is relatively more effective against a cancerpresenting with a different one of somatic mutations that occur atdifferent relative frequency. In some embodiments, a drug that isrelatively more effective against a cancer presenting with a somaticmutation occurring at higher relative frequency is administered inhigher amount. In some embodiments, the drugs are delivered at dosesthat are stratified to reflect the relative amounts of the variants inthe DNA. In some embodiments, cancers presenting with at least one ofthe genetic variants is resistant to at least one of the drugs.Additionally or alternatively, detection of a genetic biomarker caninclude a system comprising a computer readable medium comprisingmachine-executable code that, upon execution by a computer processor,implements a method comprising: (a) receiving into memory sequence readsof polynucleotides mapping to a genetic locus; (b) determining, amongsaid sequence reads, identity of bases that are different than a base ofa reference sequence at the locus of the total number of sequence readsmapping to a locus; (c) reporting the identity and relative quantity ofthe determined bases and their location in the genome; and (d) inferringheterogeneity of a given sample based on information in (c). In someembodiments, the method implemented further comprises receiving intomemory sequence reads derived from samples at a plurality of differenttimes and calculating a difference in relative amount and identity of aplurality of bases between the two samples. Additionally oralternatively, detection of a genetic biomarker can include a methodcomprising: (a) performing biomolecular analysis of biomolecularpolymers from disease cells (e.g., spatially distinct disease cells)from a subject; (b) identifying and quantifying biomolecular variants inthe biomolecular macromolecules; (c) developing a profile of diseasecell heterogeneity in the subject indicating the presence and relativequantity of a plurality of the variants in the biomolecularmacromolecules, wherein different relative quantities indicates diseasecell heterogeneity; and (d) determining a therapeutic intervention for adisease exhibiting the disease cell heterogeneity, wherein thetherapeutic intervention is effective against a disease having theprofile of disease cell heterogeneity determined. In some embodiments,the disease cells are spatially distinct disease cells. In someembodiments, the therapeutic intervention is determined based on adatabase of interventions shown to be therapeutic for cancers havingtumor heterogeneity characterized by each of the somatic mutations.Additionally or alternatively, detection of a genetic biomarker caninclude a method of detecting disease cell heterogeneity in a subjectcomprising: a) quantifying polynucleotides that bear a sequence variantat each of a plurality of genetic loci in polynucleotides from a samplefrom the subject, wherein the sample comprises polynucleotides fromsomatic cells and from disease cells; b) determining for each locus ameasure of copy number variation (CNV) for polynucleotides bearing thesequence variant; c) determining for each locus a weighted measure ofquantity of polynucleotides bearing a sequence variant at the locus as afunction of CNV at the locus; and d) comparing the weighted measures ateach of the plurality of loci, wherein different weighted measuresindicate disease cell heterogeneity. In some embodiments, the diseasecells are tumor cells. In some embodiments, polynucleotides comprisecfDNA. Additionally or alternatively, detection of a genetic biomarkercan include a method of inferring a measure of burden of DNA from cellsundergoing cell division in a sample comprising measuring copy numbervariation induced by proximity of one or more genomic loci to cells'origins of replication, wherein increased CNV indicates cells undergoingcell division. In some embodiments, the burden is measured in cell-freeDNA. In some embodiments, the measure of burden relates to the fractionof tumor cells or genome-equivalents of DNA from tumor cells in thesample. In some embodiments, CNV due to proximity to origins ofreplication is inferred from a set of control samples or cell-lines. Insome embodiments, a hidden-markov model, regression model, principalcomponent analysis-based model, or genotype-modified model is used toapproximate variations due to origins of replications. In someembodiments, the measure of burden is presence or absence of cellsundergoing cell division. In some embodiments, proximity is within 1 kbof an origin of replication. Additionally or alternatively, detection ofa genetic biomarker can include a method of increasing sensitivityand/or specificity of determining gene-related copy-number variations byameliorating the effect of variations due to proximity to origins ofreplications. In some embodiments, the method comprises measuring CNV ata locus, determining amount of CNV due to proximity of the locus to anorigin of replication, and correcting the measured CNV to reflectgenomic CNV, e.g., by subtracting amount of CNV attributable to celldivision. In some embodiments, the genomic data is obtained fromcell-free DNA. In some embodiments, the measure of burden relates to thefraction of tumor cells or genome-equivalents of DNA in a sample. Insome embodiments, variations due to origins of replication are inferredfrom a set of control samples or cell-lines. In some embodiments, ahidden-markov model, regression model, principal componentanalysis-based model, or genotype-modified model is used to approximatevariations due to origins of replications.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0061072, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods and systems for detection ofsingle-nucleotide variations (SNVs) from somatic sources in a cell-freebiological sample of a subject, such as in a mixture of nucleic acidmolecules from somatic and germline sources. In some embodiments, thesystems and methods detect single-nucleotide variations (SNVs) fromsomatic sources in a cell-free biological sample of a subject bygenerating training data with class labels; forming a machine learningunit having one output for each of adenine (A), cytosine (C), guanine(G), and thymine (T) base calls, respectively; training the machinelearning unit with a training set of biological samples; and applyingthe machine learning unit to detect the SNVs from somatic sources in thecell-free biological sample, wherein the cell-free biological sample maycomprise a mixture of nucleic acid molecules (e.g., deoxyribonucleicacid (DNA)) from somatic and germline sources, e.g., cells comprisingsomatic mutations and germline DNA.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0058332, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method to detect a somatic or germlinevariant, comprising providing a predetermined genomic DNA (gDNA) to anassay mixture, capturing a sample of a subject's genetic informationusing a gene analyzer, and detecting genetic variants from the geneticinformation; and classifying a variant as from a germ line source ifpresent in gDNA derived molecules having lengths longer than cell-freeDNA (cfDNA) derived molecules. In some embodiments, the gDNA has afragment length of more than about 200 bases. In some embodiments, thegDNA has a fragment length of at least 400 bases or at least 500 bases.In some embodiments, gDNA fragment length is higher than the cfDNAfragment length distribution. In some embodiments, the gDNA is added tothe assay mixture. In some embodiments, the gDNA is left in the assaymixture after a filtering operation. In some embodiments, the gDNA isleft in the assay mixture after a centrifugation operation. In someembodiments, approximately 1% to 5% gDNA is added to the assay mixture.In some embodiments, at least 1% gDNA is added to the assay mixture.Additionally or alternatively, detection of a genetic biomarker caninclude a method comprising providing a sample comprising both genomicDNA (gDNA) and cell-free DNA (cfDNA) from a subject; determining subjectgermline genotype at least one genetic locus from the gDNA; determininga quantitative measure of at least one genetic variant at each geneticlocus in the cfDNA; determining whether the quantitative measure of thegenetic variant is or is not consistent with germline genotype; andcalling the genetic variant as a germline variant if the quantitativemeasure is consistent with germline genotype, or as a somatic mutant ifthe quantitative measure is not consistent with the germline genotype.Additionally or alternatively, detection of a genetic biomarker caninclude a method comprising determining a quantitative measure of agenetic variant detected in cell-free DNA (cfDNA) from a subject;determining that the measure is consistent with a heterozygous genotypein the subject; determining a probable genotype of the subject at thelocus from genomic DNA (gDNA); comparing the genotype at the locus fromgDNA with the variant detected in the cfDNA; and calling the variant asa somatic mutation if the variant detected in the cfDNA is notconsistent with the genotype at the locus from gDNA. In someembodiments, calling the variant as a somatic mutation if the genotypeat the locus from gDNA is determined to be homozygous. In someembodiments, calling the variant as a somatic mutation if the genotypeat the locus from gDNA is determined to be heterozygous with aconfidence selected from the group consisting of: at least 70%, at least80%, at least 90%, at least 95%, or at least 99%. In some embodiments,determining a quantitative measure of a genetic variant comprisessequencing the cfDNA. In some embodiments, determining the probablegenotype of the subject comprises sequencing genomic DNA from thesubject. In some embodiments, the sequencing is selected from the groupconsisting of: targeted sequencing, single molecule real-timesequencing, exon sequencing, electron microscopy-based sequencing, panelsequencing, transistor-mediated sequencing, direct sequencing, randomshotgun sequencing, Sanger dideoxy termination sequencing, whole-genomesequencing, sequencing by hybridization, pyrosequencing, capillaryelectrophoresis, gel electrophoresis, duplex sequencing, cyclesequencing, single-base extension sequencing, solid-phase sequencing,high-throughput sequencing, massively parallel signature sequencing,emulsion PCR, co-amplification at lower denaturation temperature-PCR(COLD-PCR), multiplex PCR, sequencing by reversible dye terminator,paired-end sequencing, near-term sequencing, exonuclease sequencing,sequencing by ligation, short-read sequencing, single-moleculesequencing, sequencing-by-synthesis, real-time sequencing,reverse-terminator sequencing, nanopore sequencing, 454 sequencing,Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PET sequencing,and a combination thereof.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/119452, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods, compositions and systems for analyzing anucleic acid population comprising at least two forms of nucleic acidselected from double-stranded DNA, single-stranded DNA andsingle-stranded RNA. In some embodiments the method comprises (a)linking at least one of the forms of nucleic acid with at least one tagnucleic acid to distinguish the forms from one another, (b) amplifyingthe forms of nucleic acid at least one of which is linked to at leastone nucleic acid tag, wherein the nucleic acids and linked nucleic acidtag, if present, are amplified, to produce amplified nucleic acids, ofwhich those amplified from the at least one form are tagged; (c)assaying sequence data of the amplified nucleic acids at least some ofwhich are tagged; and (d) decoding tag nucleic acid molecules of theamplified nucleic acids to reveal the forms of nucleic acids in thepopulation providing an original template for the amplified nucleicacids linked to the tag nucleic acid molecules for which sequence datahas been assayed. In some embodiments, the method further comprisesenriching for at least one of the forms relative to one or more of theother forms. In some embodiments, at least 70% of the molecules of eachform of nucleic acid in the population are amplified in step (b). Insome embodiments, at least three forms of nucleic acid are present inthe population and at least two of the forms are linked to different tagnucleic acid forms distinguishing each of the three forms from oneanother. In some embodiments each of the at least three forms of nucleicacid in the population is linked to a different tag. In someembodiments, each molecule of the same form is linked to a tagcomprising the same identifying information tag (e.g., a tag with thesame or comprising the same sequence). In some embodiments, molecules ofthe same form are linked to different types of tags. In some embodimentsstep (a) comprises: subjecting the population to reverse transcriptionwith a tagged primer, wherein the tagged primer is incorporated intocDNA generated from RNA in the population. In some embodiments, thereverse transcription is sequence-specific. In some embodiments, thereverse transcription is random. In some embodiments, the method furthercomprises degrading RNA duplexed to the cDNA. In some embodiments, themethod further comprises separating single-stranded DNA fromdouble-stranded DNA and ligating nucleic acid tags to thedouble-stranded DNA. In some embodiments, the single-stranded DNA isseparated by hybridization to one or more capture probes. In someembodiments, the method further comprises differentially taggingsingle-stranded DNA with a single-stranded tag using a ligase thatfunctions on single stranded nucleic acids, and double-stranded DNA withdouble-stranded adapters using ligase that functions on double-strandednucleic acids. In some embodiments, the method further comprises beforeassaying, pooling tagged nucleic acids comprising different forms ofnucleic acid. In some embodiments, the method further comprisesanalyzing the pools of partitioned DNA separately in individual assays.The assays can be the same, substantially similar, equivalent, ordifferent. In any of the above methods, the sequence data can indicatepresence of a somatic or germline variant, or a copy number variation ora single nucleotide variation, or an indel or gene fusion. Additionallyor alternatively, detection of a genetic biomarker can include a methodof analyzing a nucleic acid population comprising nucleic acids withdifferent extents of modification. Additionally or alternatively,detection of a genetic biomarker can include methods for screening forcharacteristics (e.g., 5′ methylcytosine) associated with a disease. Themethod comprises contacting the nucleic acid population with an agent(such as a methyl binding domain or protein) that preferentially bindsto nucleic acids bearing the modification; separating a first pool ofnucleic acids bound to the agent from a second pool of nucleic acidsunbound to the agent, wherein the first pool of nucleic acids areoverrepresented for the modification, and the nucleic acids in thesecond pool are underrepresented for the modification; linking thenucleic acids in the first pool and/or second pool to one or morenucleic acid tags that distinguish the nucleic acids in the first pooland the second pool to produce a population of tagged nucleic acids;amplifying the tagged nucleic acids, wherein the nucleic acids and thelinked tags are amplified; assaying sequence data of the amplifiednucleic acids and linked tags; decoding the tags to reveal whether thenucleic acids for which sequence data has been assayed were amplifiedfrom templates in the first or second pool. Additionally oralternatively, detection of a genetic biomarker can include a method foranalyzing a nucleic acid population in which at least some of thenucleic acids include one or more modified cytosine residues. The methodcomprises linking capture moieties, e.g., biotin, to nucleic acids inthe population to serve as templates for amplification; performing anamplification reaction to produce amplification products from thetemplates; separating the templates linked to capture moieties fromamplification products; assaying sequence data of the templates linkedto capture moieties by bisulfite sequencing; and assaying sequence dataof the amplification products. Additionally or alternatively, detectionof a genetic biomarker can include a method of analyzing a nucleic acidpopulation comprising nucleic acids with different extents of5-methylcytosine. The method comprises (a) contacting the nucleic acidpopulation with an agent that preferentially binds to 5-methylatednucleic acids; (b) separating a first pool of nucleic acids bound to theagent from a second pool of nucleic acids unbound to the agent, whereinthe first pool of nucleic acids are overrepresented for5-methylcytosine, and the nucleic acids in the second pool areunderrepresented for 5-methylation; (c) linking the nucleic acids in thefirst pool and/or second pool to one or more nucleic acid tags thatdistinguish the nucleic acids in the first pool and the second pool,wherein the nucleic acid tags linked to nucleic acids in the first poolcomprise a capture moiety (e.g., biotin); (d) amplifying the labellednucleic acids, wherein the nucleic acids and the linked tags areamplified; (e) separating amplified nucleic acids bearing the capturemoiety from amplified nucleic acids that do not bear the capture moiety;and (f) assaying sequence data of the separated, amplified nucleicacids. Additionally or alternatively, detection of a genetic biomarkercan include a method of analyzing a nucleic acid population comprisingat least two forms of nucleic acid selected from double-stranded DNA,single-stranded DNA and single-stranded RNA, the method, wherein each ofthe at least two forms comprises a plurality of molecules, comprising:linking at least one of the forms of nucleic acid with at least one tagnucleic acid to distinguish the forms from one another, amplifying theforms of nucleic acid at least one of which is linked to at least onenucleic acid tag, wherein the nucleic acids and linked nucleic acid tag,are amplified, to produce amplified nucleic acids, of which thoseamplified from the at least one form are tagged; assaying sequence dataof the amplified nucleic acids at least some of which are tagged;wherein the assaying obtains sequence information sufficient to decodethe tag nucleic acid molecules of the amplified nucleic acids to revealthe forms of nucleic acids in the population providing an originaltemplate for the amplified nucleic acids linked to the tag nucleic acidmolecules for which sequence data has been assayed. In one embodimentthe method further comprises the step of decoding the tag nucleic acidmolecules of the amplified nucleic acids to reveal the forms of nucleicacids in the population providing an original template for the amplifiednucleic acids linked to the tag nucleic acid molecules for whichsequence data has been assayed. In another embodiment, the methodfurther comprises enriching for at least one of the forms relative toone or more of the other forms. In another embodiment, at least 70% ofthe molecules of each form of nucleic acid in the population areamplified. In another embodiment, at least three forms of nucleic acidare present in the population and at least two of the forms are linkedto different tag nucleic acid forms distinguishing each of the threeforms from one another. In another embodiment each of the at least threeforms of nucleic acid in the population is linked to a different tag. Inanother embodiment, each molecule of the same form is linked to a tagcomprising the same tag information. In another embodiment, molecules ofthe same form are linked to different types of tags. In anotherembodiment, the method further comprises subjecting the population toreverse transcription with a tagged primer, wherein the tagged primer isincorporated into cDNA generated from RNA in the population. In anotherembodiment, the reverse transcription is sequence-specific. In anotherembodiment wherein the reverse transcription is random. In anotherembodiment, the method further comprises degrading RNA duplexed to thecDNA. In another embodiment, the method further comprises separatingsingle-stranded DNA from double-stranded DNA and ligating nucleic acidtags to the double-stranded DNA. In another embodiment, thesingle-stranded DNA is separated by hybridization to one or more captureprobes. In another embodiment, the method further comprisescircularizing single-stranded DNA with a circligase and ligating nucleicacid tags to the double-stranded DNA. In another embodiment, the methodcomprises, before assaying, pooling tagged nucleic acids comprisingdifferent forms of nucleic acid. In another embodiment, the nucleic acidpopulation is from a bodily fluid sample. In another embodiment, thebodily fluid sample is blood, serum, or plasma. In another embodiment,the nucleic acid population is a cell free nucleic acid population. Inanother embodiment, the bodily fluid sample is from a subject suspectedof having a cancer. In another embodiment the sequence data indicatespresence of a somatic or germline variant. In another embodiment, thesequence data indicates presence of a copy number variation. In anotherembodiment, the sequence data indicates presence of a single nucleotidevariation (SNV), indel or gene fusion. In another embodiment, thesequence data indicates presence of a single nucleotide variation (SNV),indel or gene fusion. Additionally or alternatively, detection of agenetic biomarker can include a method, comprising: providing apopulation of nucleic acid molecules obtained from a bodily sample of asubject; fractionating the population of nucleic acid molecules based onone or more characteristics to generate plurality of groups of nucleicacid molecules, wherein the nucleic acid molecules of each of theplurality of groups comprise distinct identifiers; pooling the pluralityof groups of nucleic acid molecules; sequencing the pooled plurality ofgroups of nucleic acid molecules to generate plurality of sets sequencereads; and fractionating the sequence reads based on the identifiers.Additionally or alternatively, detection of a genetic biomarker caninclude a method for analyzing the fragmentation pattern of cell-freeDNA comprising: providing a population of cell-free DNA from abiological sample; fractionating the population of cell-free DNA,thereby generating subpopulations of cell-free DNA; sequencing at leastone subpopulation of cell-free DNA, thereby generating sequence reads;aligning the sequence reads to a reference genome; and, determining thefragmentation pattern of the cell-free DNA in each subpopulation byanalyzing any number of the: length of each sequence read mapping toeach base position in the reference genome; number of sequence readsmapping to the base position in the reference genome as a function oflength of the sequence reads; number of sequence reads starting at eachbase position in the reference genome; or, number of sequence readsending at each base position in the reference genome. In anotherembodiment, the one or more characteristics comprise a chemicalmodification selected from the group consisting of: methylation,hydroxymethylation, formylation, acetylation, and glycosylation.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/009723, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods, systems, and compositions for performingnucleosome profiling using cell-free nucleic acids (e.g., cfDNA). Thiscan be used to identify new driver genes, determine copy numbervariation (CNV), identify somatic mutations and structural variationssuch as fusions and indels, as well as identify regions that can be usedin a multiplexed assay to detect any of the above variations.Additionally or alternatively, detection of a genetic biomarker caninclude various uses of cell-free nucleic acids (e.g., DNA or RNA). Suchuses include detecting, monitoring and determining treatment for asubject having or suspected of having a health condition, such as adisease (e.g., cancer). The methods provided may use sequenceinformation in a macroscale and global manner, with or without somaticvariant information, to assess a fragmentome profile that can berepresentative of a tissue of origin, disease, progression, etc.Additionally or alternatively, detection of a genetic biomarker caninclude a computer-implemented method for determining a presence orabsence of a genetic aberration in deoxyribonucleic acid (DNA) fragmentsfrom cell-free DNA obtained from a subject, the method comprising: (a)constructing, by a computer, a multi-parametric distribution of the DNAfragments over a plurality of base positions in a genome; and (b)without taking into account a base identity of each base position in afirst locus, using the multi-parametric distribution to determine thepresence or absence of the genetic aberration in the first locus in thesubject. In some embodiments, the genetic aberration comprises asequence aberration. In some embodiments, the sequence aberrationcomprises a single nucleotide variant (SNV). In some embodiments, thesequence aberration comprises an insertion or deletion (indel), or agene fusion. In some embodiments, the sequence aberration comprises twoor more different members selected from the group consisting of (i) asingle nucleotide variant (SNV), (ii) an insertion or deletion (indel),and (iii) a gene fusion. In some embodiments, the genetic aberrationcomprises a copy number variation (CNV). In some embodiments, themulti-parametric distribution comprises a parameter indicative of alength of the DNA fragments that align with each of the plurality ofbase positions in the genome. In some embodiments, the multi-parametricdistribution comprises a parameter indicative of a number of the DNAfragments that align with each of the plurality of base positions in thegenome. In some embodiments, the multi-parametric distribution comprisesa parameter indicative of a number of the DNA fragments that start orend at each of the plurality of base positions in the genome. In someembodiments, n the multi-parametric distribution comprises parametersindicative of two or more of: (i) a length of the DNA fragments thatalign with each of the plurality of base positions in the genome, (ii) anumber of the DNA fragments that align with each of the plurality ofbase positions in the genome, and (iii) a number of the DNA fragmentsthat start or end at each of the plurality of base positions in thegenome. In some embodiments, the multi-parametric distribution comprisesparameters indicative of (i) a length of the DNA fragments that alignwith each of the plurality of base positions in the genome, (ii) anumber of the DNA fragments that align with each of the plurality ofbase positions in the genome, and (iii) a number of the DNA fragmentsthat start or end at each of the plurality of base positions in thegenome. Additionally or alternatively, detection of a genetic biomarkercan include a method of generating a classifier for determining alikelihood that a subject belongs to one or more classes of clinicalsignificance, the method comprising: a) providing a training setcomprising, for each of the one or more classes of clinicalsignificance, populations of cell-free DNA from each of a plurality ofsubjects of a species belonging to the class of clinical significanceand from each of a plurality of subjects of the species not belonging tothe class of clinical significance; b) sequencing cell-free DNAfragments from the populations of cell-free DNA to produce a pluralityof DNA sequences; c) for each population of cell-free DNA, mapping theplurality of DNA sequences to each of one or more genomic regions in areference genome of the species, each genomic region comprising aplurality of genetic loci; d) preparing, for each population ofcell-free DNA, a dataset comprising, for each of a plurality of thegenetic loci, values indicating a quantitative measure of at least onecharacteristic selected from: (i) DNA sequences mapping to the geneticlocus, (ii) DNA sequences starting at the locus, and (iii) DNA sequencesending at the genetic locus, to yield a training set; and e) training acomputer-based machine learning system on the training set, therebygenerating a classifier for determining a likelihood that the subjectbelongs to one or more classes of clinical significance. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetermining an abnormal biological state in a subject, the methodcomprising: a) sequencing cell-free DNA fragments from cell-free DNAfrom the subject to produce DNA sequences; b) mapping the DNA sequencesto each of one or more genomic regions in a reference genome of aspecies of the subject, each genomic region comprising a plurality ofgenetic loci; c) preparing a dataset comprising, for each of a pluralityof the genetic loci, values indicating a quantitative measure of atleast one feature selected from: (i) DNA sequences mapping to thegenetic locus, (ii) DNA sequences starting at the locus, and (iii) DNAsequences ending at the genetic locus; and d) based on the dataset,determining a likelihood of the abnormal biological state. Additionallyor alternatively, detection of a genetic biomarker can include acomputer-implemented method for generating an output indicative of apresence or absence of a genetic aberration in deoxyribonucleic acid(DNA) fragments from cell-free DNA obtained from a subject, the methodcomprising: (a) constructing, by a computer, a distribution of the DNAfragments from the cell-free DNA over a plurality of base positions in agenome; and (b) for each of one or more genetic loci, calculating, by acomputer, a quantitative measure indicative of a ratio of (1) a numberof the DNA fragments with dinucleosomal protection associated with agenetic locus from the one or more genetic loci, and (2) a number of theDNA fragments with mononucleosomal protection associated with thegenetic locus, or vice versa; and (c) determining, using thequantitative measure for each of the one or more genetic loci, saidoutput indicative of a presence or absence of the genetic aberration inthe one or more genetic loci in the subject. In some embodiments, thedistribution comprises one or more multi-parametric distributions.Additionally or alternatively, detection of a genetic biomarker caninclude a computer-implemented method for deconvolving a distribution ofdeoxyribonucleic acid (DNA) fragments from cell-free DNA obtained from asubject, the method comprising: (a) constructing, by a computer, adistribution of a coverage of the DNA fragments from the cell-free DNAover a plurality of base positions in a genome; and (b) for each of oneor more genetic loci, deconvolving, by a computer, the distribution ofthe coverage, thereby generating fractional contributions associatedwith one or more members selected from the group consisting of a copynumber (CN) component, a cell clearance component, and a gene expressioncomponent. Additionally or alternatively, detection of a geneticbiomarker can include a computer-implemented classifier for determininggenetic aberrations in a test subject using deoxyribonucleic acid (DNA)fragments from cell-free DNA obtained from the test subject, comprising:(a) an input of a set of distribution scores for each of one or morepopulations of cell-free DNA obtained from each of a plurality ofsubjects, wherein each distribution score is generated based at least onone or more of: (i) a length of the DNA fragments that align with eachof a plurality of base positions in a genome, (ii) a number of the DNAfragments that align with each of a plurality of base positions in agenome, and (iii) a number of the DNA fragments that start or end ateach of a plurality of base positions in a genome; and (b) an output ofclassifications of one or more genetic aberrations in the test subject.Additionally or alternatively, detection of a genetic biomarker caninclude a computer-implemented method for creating a trained classifier,comprising: (a) providing a plurality of different classes, wherein eachclass represents a set of subjects with a shared characteristic; (b) foreach of a plurality of populations of cell-free DNA obtained from eachof the classes, providing a multi-parametric model representative ofcell-free deoxyribonucleic acid (DNA) fragments from the populations ofcell-free DNA, thereby providing a training data set; and (c) training,by a computer, a learning algorithm on the training data set to createone or more trained classifiers, wherein each trained classifier isconfigured to classify a test population of cell-free DNA from a testsubject into one or more of the plurality of different classes.Additionally or alternatively, detection of a genetic biomarker caninclude a method of classifying a test sample from a subject,comprising: (a) providing a multi-parametric model representative ofcell-free deoxyribonucleic acid (DNA) fragments from a test populationof cell-free DNA from the subject; and (b) classifying the testpopulation of cell-free DNA using a trained classifier. Additionally oralternatively, detection of a genetic biomarker can include acomputer-implemented method comprising: (a) generating, by a computer,sequence information from cell-free DNA fragments from a subject; (b)mapping, by a computer, the cell-free DNA fragments to a referencegenome based on the sequence information; and (c) analyzing, by acomputer, the mapped cell-free DNA fragments to determine, at each of aplurality of base positions in the reference genome, a plurality ofmeasures selected from the group consisting of: (i) number of cell-freeDNA fragments mapping to the base position, (ii) length of eachcell-free DNA fragment mapping to the base position, (iii) number ofcell-free DNA fragments mapping to the base position as a function oflength of the cell-free DNA fragment; (iv) number of cell-free DNAfragments starting at the base position; (v) number of cell-free DNAfragments ending at the base position; (vi) number of cell-free DNAfragments starting at the base position as a function of length, and(vii) number of cell-free DNA fragments ending at the base position as afunction of length. Additionally or alternatively, detection of agenetic biomarker can include a computer-implemented method fordeconvolving a distribution of deoxyribonucleic acid (DNA) fragmentsfrom cell-free DNA obtained from a subject, the method comprising: (a)constructing, by a computer, a distribution of a coverage of the DNAfragments from the cell-free DNA over a plurality of base positions in agenome; and (b) for each of one or more genetic loci, deconvolving, by acomputer, the distribution of the coverage, thereby generatingfractional contributions associated with one or more members selectedfrom the group consisting of a copy number (CN) component, a cellclearance component, and a gene expression component. In someembodiments, the method further comprises comprising generating anoutput indicative of a presence or absence of a genetic aberration basedat least on a portion of the fractional contributions.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/181146, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods and systems that may be used for earlycancer detection. In some embodiments, the method comprises (a)providing a sample comprising cfDNA from a subject, wherein the subjectdoes not detectably exhibit a cancer; (b) capturing from the samplecfDNA molecules covered by a sequencing panel, wherein the sequencingpanel comprises one or more regions from each of a plurality ofdifferent genes, wherein: (i) the sequencing panel is no greater than50,000 nucleotides; (ii) the presence of a tumor marker in any one ofthe different genes indicates that the subject has the cancer; and (iii)at least 80% of subjects having the cancer have a tumor marker presentin at least one of the plurality of different genes; and (c) sequencingthe captured cfDNA molecules to a read depth sufficient to detect thetumor markers at a frequency in the sample as low as 0.01%. In someembodiments, a tumor marker is selected from the group consisting of asingle base substitution, a copy number variation, an indel, a genefusion, a transversion, a translocation, an inversion, a deletion,aneuploidy, partial aneuploidy, polyploidy, chromosomal instability,chromosomal structure alterations, chromosome fusions, a genetruncation, a gene amplification, a gene duplication, a chromosomallesion, a DNA lesion, abnormal changes in nucleic acid chemicalmodifications, abnormal changes in epigenetic patterns and abnormalchanges in nucleic acid methylation. In some embodiments, at least 85%,at least 90%, at least 93%, at least 95%, at least 97%), at least 98% orat least 99% of subjects having the cancer have a tumor marker presentin at least one of the plurality of different genes. Some embodimentscomprise sequencing the captured cfDNA molecules to a read depthsufficient to detect the tumor markers at a frequency in the sample aslow as 0.005%, 0.001% or 0.0005%. Additionally or alternatively,detection of a genetic biomarker can include a method comprising: a.providing a sample comprising cell-free nucleic acid (cfNA) moleculesfrom a subject, wherein the subject does not detectably exhibit acancer; b. capturing from the sample cfNA molecules covered by asequencing panel, wherein the sequencing panel comprises one or moreregions from each of a plurality of different genes, wherein: i. thesequencing panel is no greater than 50,000 nucleotides; ii. a presenceof a tumor marker in any one of the different genes indicates that thesubject has the cancer; and iii. at least 80% of subjects having thecancer have a tumor marker present in at least one of the plurality ofdifferent genes; and c. sequencing the captured cfNA molecules to a readdepth sufficient to detect the tumor markers at a frequency in thesample as low as 1.0%, 0.75%, 0.5%, 0.25%, 0.1%, 0.075%, 0.05%, 0.025%,0.01%, or 0.005%. Additionally or alternatively, detection of a geneticbiomarker can include a method for detecting cancer in a subjectcomprising: sequencing circulating cell-free DNA (cfDNA) from thesubject at a depth of at least 50,000 reads per base to detect one ormore genetic variants associated with cancer. In some embodiments, thesequencing is at a depth of at least 100,000 reads per base. In someembodiments, the sequencing is at a depth of about 120,000 reads perbase. In some embodiments, the sequencing is at a depth of about 150,000reads per base. In some embodiments, the sequencing is at a depth ofabout 200,000 reads per base. In some embodiments, the reads per baserepresent at least 5,000 original nucleic acid molecules, at least10,000 original nucleic acid molecules, at least 20,000 original nucleicacid molecules, at least 30,000 original nucleic acid molecules, atleast 40,000 original nucleic acid molecules, or at least 50,000original nucleic acid molecules. In some embodiments, the method furthercomprises comparing sequence information from the cfDNA to sequenceinformation obtained from a cohort of healthy individuals, a cohort ofcancer patients, or germline DNA from the subject. In some embodiments,the method further comprises amplifying the cfDNA prior to sequencing,and determining a consensus sequence from sequence reads obtained fromthe sequencing to reduce errors from amplification or sequencing. Insome embodiments, determining the consensus sequence is performed on amolecule-by-molecule basis. In some embodiments, determining theconsensus sequence is performed on a base by base basis. In someembodiments, detection of consensus sequence is based on assessingprobabilities of each of the potential nucleotides based on the observedsequencing output, as well as sequencing and amplification error profilecharacteristics of an individual sample, a batch of samples, or areference set of samples. In some embodiments, determining the consensussequence is performed using molecular barcodes that tag individual cfDNAmolecules derived from the subject. In some embodiments, a set ofmolecules with a consensus sequence deviant from the human reference iscompared to those observed in other samples processed in the laboratoryto determine and exclude any potential contaminating event. In someembodiments, determining the consensus sequence is optimized bycomparing the consensus sequence to those obtained from the cohort ofhealthy individuals, the cohort of cancer patients, or the germline DNAfrom the subject. In some embodiments, the method further comprisestagging the cfDNA molecules with a barcode such that at least 20% of thecfDNA in a sample derived from the subject are tagged. In someembodiments, the tagging is performed by attaching adaptors comprising abarcode. In some embodiments, the adaptors comprise any or all of bluntend adaptors, restriction enzyme overhang adaptors, or adaptors with asingle nucleotide overhang. In some embodiments, the adaptors with asingle nucleotide overhang comprise C-tail adaptors, A-tail adaptors,T-tail adaptors, and/or G-tail adaptors. In some embodiments, thetagging is performed by PCR amplification using primers with barcodes.In some embodiments, the barcode is single stranded. In someembodiments, the barcode is double stranded. In some embodiments, themethod further comprises dividing the cfDNA into partitions. In someembodiments, the cfDNA in each partition is uniquely tagged with respectto each other partition. In some embodiments, the cfDNA in eachpartition is non-uniquely tagged with respect to each other partition.In some embodiments, the cfDNA in each partition is not tagged.Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting a tumor in a subject suspected of havingcancer or having cancer, comprising: (a) sequencing cell-free DNA(cfDNA) molecules derived from a cell-free DNA (cfDNA) sample obtainedfrom the subject; (b) analyzing sequence reads derived from thesequencing to identify (i) circulating tumor DNA (ctDNA) among the cfDNAmolecules and (ii) one or more driver mutations in the cfDNA; and (c)using information about the presence, absence, or amount of the one ormore driver mutations in the ctDNA molecules to identify (i) the tumorin the subject and (ii) actions for treatment of the tumor to be takenby the subject, wherein the method detects the tumor in the subject witha sensitivity of at least 85%, a specificity of at least 99%, and adiagnostic accuracy of at least 99%.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/136603, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods and systems for detecting or monitoringcancer evolution. Additionally or alternatively, detection of a geneticbiomarker can include a computer-implemented method, comprising: (a)obtaining information about a plurality of subjects with cancer at afirst time point, wherein the information comprises for each subject ofthe plurality of subjects at least a genetic profile of a tumor obtainedby genotyping nucleic acids from a cell-free bodily fluid and anytreatment provided to the subject before the first time point, anddetermining a first state of each of the plurality of subjects based onthe information at the first time point to produce a set of firststates; (b) obtaining the information about the plurality of subjects atone or more second time points subsequent to the first time point, anddetermining a second state of each of the plurality of subjects at eachof the one or more second time points based on the information at agiven one of the one or more second time points, to produce a set ofsubsequent states; and (c) using the set of first states from (a) andthe set of subsequent states from (b) to generate a predictive algorithmthat is configured to determine a probability that a given first statewill result in a second state among a set of states at a later timepoint subsequent to the given first state. In some embodiments, themethod further comprises (d) for the given first state among a set ofstates at an earlier time point, determining the probability that thegiven first state will result in the second state among the set ofstates at the later time point; and (e) generating an electronic outputindicative of the probability determined in (d). Additionally oralternatively, detection of a genetic biomarker can include acomputer-implemented method, comprising: (a) obtaining information abouta plurality of subjects with cancer at a first time point, wherein theinformation comprises, for each subject of the plurality of subjects, atleast a genetic profile of a tumor obtained by genotyping at least 50genes and any treatment provided to the subject before the first timepoint, and determining a first state of each of the plurality ofsubjects based on the information at the first time point, to produce aset of first states; (b) obtaining the information about the pluralityof subjects at one or more second time points subsequent to the firsttime point, and determining a second state of each of the plurality ofsubjects at each of the one or more second time points based on theinformation at a given one of the one or more second time points, toproduce a set of subsequent states; and (c) using the set of firststates from (a) and the set of subsequent states from (b) to generate apredictive algorithm that is configured to determine a probability thata given first state will result in a second state among a set of statesat a later time point subsequent to the given first state. In someembodiments, the method further comprises (d) for the given first stateamong a set of states at an earlier time point, determining theprobability that the given first state will result in the second stateamong the set of states at the later time point; and (e) generating anelectronic output indicative of the probability determined in (d). Insome embodiments, obtaining the information comprises sequencingcell-free deoxyribonucleic acid (cfDNA) from the plurality of subjectsand, optionally, performing a medical interview of each of the pluralityof subjects. In some embodiments, treatment was provided to the subjectbefore the first time point. In some embodiments, the methods comprisegenerating one or more decision trees, each decision tree comprising aroot node, one or more decision branches, one or more decision nodes,and one or more terminal nodes, wherein a state at the root noderepresents the first time point, the one or more decision branchesrepresent alternative treatments, and the one or more decision nodes andthe one or more terminal nodes represent subsequent states. In someembodiments, the one or more decision branches comprise a plurality ofdecision branches. In some embodiments, the subsequent states comprise aviability state(s) of the subjects indicative of the subjects beingalive or deceased. In some embodiments, the subsequent states comprise asubject survival rate. In some embodiments, each of the first statescomprises a common set of one or more somatic mutations. In someembodiments, the information further comprises a subject profile.Additionally or alternatively, detection of a genetic biomarker caninclude a method, comprising: (a) obtaining information about a subjectwith a cancer at a first time point, wherein the information comprisesat least one characteristic of the subject from a patient profile, atumor profile, or a treatment; (b) determining an initial state of thesubject based on the information at the first time point; (c)determining a probability for each of a plurality of subsequent statesat each of one or more subsequent time points based on the initial stateof the subject, thereby providing a set of probabilities with regards tostate outcomes; (d) generating a recommendation of a treatment for thecancer based at least in part on the set of probabilities with regardsto state outcomes that optimizes for a probability that subject obtainsa particular outcome; and (e) generating an electronic output indicativeof the recommendation generated in (d). In some embodiments, theprobability is at least in part a function of a treatment choice fromamong a plurality of treatment choices. In some embodiments, the one ormore subsequent time points comprises a plurality of subsequent timepoints. In some embodiments, the method further comprises determiningthe probability at a plurality of subsequent time points. In someembodiments, the time points comprise at least three time points. Insome embodiments, the time points comprise at least four time points. Insome embodiments, the first time point is prior to the subject receivingthe treatment and the subsequent time point is after the subjectreceiving the treatment. In some embodiments, a second treatment isadministered after the subsequent time point based on the subsequentstate at the subsequent time point. In some embodiments, the at leastone characteristic of the subject is from the patient profile and isselected from the group consisting of: age, gender, genetic profile,enzyme levels, organ function, quality of life, frequency of medicalinterventions, remission status, and patient outcome. Additionally oralternatively, detection of a genetic biomarker can include a method,comprising: (a) establishing one or more communications links over acommunication network with one or more medical service providers; (b)receiving over the communications network from the one or more medicalservice providers medical information about one or more subjects; (c)receiving from the medical service provider one or more samplescomprising cell-free deoxyribonucleic acid (cfDNA) from each of the oneor more subjects; (d) sequencing the cfDNA and identifying one or moregenetic variants present in the cfDNA; (e) creating or supplementing adatabase with information for each of the one or more subjects, theinformation comprising both identified genetic variants and receivedmedical information; and (f) using the database and a computerimplemented algorithm, generating at least one predictive model thatpredicts, based on an initial state of a subject, the probability of asubsequent state for each of a plurality of different therapeuticinterventions. Additionally or alternatively, detection of a geneticbiomarker can include a non-transitory computer-readable mediumcomprising machine executable code that, upon execution by one or morecomputer processors, implements a method comprising: (a) obtaininginformation about a plurality of subjects with cancer at a first timepoint, wherein the information comprises, for each subject of theplurality of subjects, at least a genetic profile of a tumor obtained bygenotyping nucleic acids from a cell-free bodily fluid and any treatmentprovided to the subject before the first time point, and determining afirst state of each of the plurality of subjects based on theinformation at the first time point, to produce a set of first states;(b) obtaining the information about the plurality of subjects at one ormore second time points subsequent to the first time point, anddetermining a second state of each of the plurality of subjects at eachof the one or more second time points based on the information at agiven one of the one or more second time points, to produce a set ofsubsequent states; and (c) using the set of first states from (a) andthe set of subsequent states from (b) to generate a predictive algorithmthat is configured to determine a probability that a given first statewill result in a second state among a set of states at a later timepoint subsequent to the given first state. Additionally oralternatively, detection of a genetic biomarker can include a method,comprising: (a) obtaining information about a subject comprising atleast a genetic profile of a tumor and a treatment previously orcurrently provided to the subject, if any, and determining an initialstate of the subject based on the information; (b) providing a decisiontree, wherein a root node represents an initial subject state, decisionbranches represent alternative treatments available to the subject,chance nodes represent points of uncertainty, and decision nodes orterminal nodes represent subsequent states; (c) providing a course oftreatment for the subject that maximizes a probability of the subjectachieving a living state at a terminal node; and (d) administering thecourse of treatment to the subject. In some embodiments, the methodfurther comprises: (e) at a second time point subsequent to the initialstate, obtaining information about a subject comprising at least agenetic profile of a tumor and a treatment previously or currentlyprovided to the subject, if any, and determining an second state of thesubject among a plurality of subsequent states based on the information;(f) based on the second state, providing a subsequent course oftreatment for the subject that maximizes probability of the subjectachieving a living state at a terminal node; and (g) administering thesubsequent course of treatment to the subject. In some embodiments, themethod further comprises: (e) at a second time point subsequent to theinitial state, obtaining information about a subject comprising at leasta genetic profile of a tumor and a treatment previously or currentlyprovided to the subject, if any, and determining an second state of thesubject among a plurality of subsequent states based on the information;(f) based on the second state, providing a subsequent course oftreatment for the subject that maximizes probability of the subjectachieving a living state at a terminal node; and (g) administering thesubsequent course of treatment to the subject.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 10,017,759, which is hereby incorporated by referencein its entirety. For example, detection of a genetic biomarker caninclude methods for preparing a library of nucleic acid fragments, andmore specifically to methods for preparing a library of nucleic acidfragments in a single tube using proteases for a variety of applicationsincluding, e.g., next generation DNA sequencing. Additionally, a methodof preparing a library of tagged nucleic acid fragments including (a)contacting a population of cells directly with a lysis reagent togenerate a cell lysate, wherein the lysis reagent has one or moreproteases, and wherein the cell lysate contains a target nucleic acid;(b) inactivating the one or more proteases to form an inactivated celllysate, and (c) directly applying at least one transposase and at leastone transposon end composition containing a transferred strand to theinactivated cell lysate under conditions where the target nucleic acidand the transposon end composition undergo a transposition reaction togenerate a mixture, wherein (i) the target nucleic acid is fragmented togenerate a plurality of target nucleic acid fragments, and (ii) thetransferred strand of the transposon end composition is joined to 5′ends of each of a plurality of the target nucleic acid fragments togenerate a plurality of 5′ tagged target nucleic acid fragments. In someembodiments, steps (a), (b), and (c) are performed in a single reactionmixture, e.g., in a tube. In some embodiments, the population of cellsis a minimal population of cells. In some embodiments, the minimalpopulation of cells contains one, two, three, four, or five cells. Insome embodiments, the target nucleic acid is a double-stranded DNA, andwherein the target nucleic acid remains the double-stranded DNA prior toapplying a transposase and a transposon end composition in step (c). Insome embodiments, the target nucleic acid is genomic DNA. In someembodiments, the target nucleic acid contains chromosomal DNA or afragment thereof. In some embodiments, the target nucleic acid includesa genome or a partial genome. In some embodiments, the method furtherincludes (d) incubating the mixture from step (c) directly with at leastone nucleic acid modifying enzyme under conditions wherein a 3′ tag isjoined to the 5′ tagged target nucleic acid fragments to generate aplurality of di-tagged target nucleic acid fragments. In someembodiments, steps (a), (b), (c), and (d) are performed in a singlereaction tube. In some embodiments, the method further includes (e)amplifying one or more di-tagged target nucleic acid fragments togenerate a library of tagged nucleic acid fragments with additionalsequence at 5′ end and/or 3′ end of the di-tagged nucleic acidfragments. In some embodiments, steps (a), (b), (c), (d), and (e) areperformed in a single reaction tube. In some embodiments, the amplifyingincludes use of one or more of a polymerase chain reaction (PCR), astrand-displacement amplification reaction, a rolling circleamplification reaction, a ligase chain reaction, atranscription-mediated amplification reaction, or a loop-mediatedamplification reaction. In some embodiments, the amplifying includes aPCR using a single primer that is complementary to the 3′ tag of thedi-tagged target DNA fragments. In some embodiments, the amplifyingincludes a PCR using a first and a second primer, wherein at least a 3′end portion of the first primer is complementary to at least a portionof the 3′ tag of the di-tagged target nucleic acid fragments, andwherein at least a 3′ end portion of the second primer exhibits thesequence of at least a portion of the 5′ tag of the di-tagged targetnucleic acid fragments. In some embodiments, a 5′ end portion of thefirst primer is non-complementary to the 3′ tag of the di-tagged targetnucleic acid fragments, and a 5′ end portion of the second primer doesnot exhibit the sequence of at least a portion of the 5′ tag of thedi-tagged target nucleic acid fragments. In some embodiments, the firstprimer includes a first universal sequence, and/or wherein the secondprimer includes a second universal sequence. In some embodiments, themethod further includes sequencing the tagged nucleic acid fragments. Insome embodiments, the sequencing of the tagged nucleic acid fragmentsincludes use of one or more of sequencing by synthesis, bridge PCR,chain termination sequencing, sequencing by hybridization, nanoporesequencing, and sequencing by ligation. In some embodiments, thesequencing of the tagged nucleic acid fragments includes use of nextgeneration sequencing. In some embodiments, the method further includesanalyzing copy number variation. In some embodiments, the method furtherincludes analyzing single nucleotide variation.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,944,924, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includethe use of application-specific capture primers in next generationsequencing. A method of modifying an immobilized capture primer caninclude: a) providing a solid support having an immobilizedapplication-specific capture primer, the application-specific captureprimer including: i) a 3′ portion including an application-specificcapture region, and ii) a 5′ portion including a universal captureregion; b) contacting an application-specific polynucleotide with theapplication-specific capture primer under conditions sufficient forhybridization to produce an immobilized application-specificpolynucleotide, and c) removing the application-specific capture regionof an application-specific capture primer not hybridized to anapplication-specific polynucleotide to convert the unhybridizedapplication-specific capture primer to a universal capture primer. Insome embodiments, a portion of the application-specific capture regionis removed. In some embodiments, the application-specific capture primercomprises a plurality of different immobilized application-specificcapture primers. In some embodiments, the application-specificpolynucleotide comprises a plurality of different application-specificpolynucleotides. In some embodiments, the application-specific captureregion includes a target-specific capture region and theapplication-specific polynucleotide includes a target polynucleotide. Insome embodiments, the application-specific capture region includes atransposon end (TE) region and the application-specific polynucleotideincludes a TE oligonucleotide. In some embodiments, the method furtherincludes applying an oligonucleotide before execution of step c) underconditions sufficient for oligonucleotide hybridization with theuniversal capture region of an application-specific capture primer toproduce a double-stranded DNA region. In certain embodiments, theoligonucleotide is a P5 or P7 oligonucleotide. In some embodiments, themethod further includes applying an oligonucleotide before execution ofstep c) under conditions sufficient for oligonucleotide hybridizationwith the application-specific capture region of an application-specificcapture primer to produce a double-stranded DNA region. In someembodiments, the method further includes contacting theapplication-specific capture primer with a nuclease, wherein theapplication-specific capture region of an application specific captureprimer not hybridized with an application-specific polynucleotide to isremoved by the nuclease. In some embodiments, the nuclease is anexonuclease. In some embodiments, the exonuclease is exonuclease I. Insome embodiments, the exonuclease is exonuclease III. In someembodiments, the nuclease is an endonuclease.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,992,598, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method for amplicon preparation. The method includes: (a) contacting anucleic acid sample including a plurality of target polynucleotides withat least one primer under conditions sufficient for hybridization, theat least one primer containing an adapter; (b) amplifying by polymerasechain reaction (PCR) the plurality of target polynucleotides to producea plurality of amplicons; (c) directly contacting a plurality of targetspecific capture primers immobilized on a solid support with theplurality of amplicons under conditions sufficient for hybridization toproduce a first plurality of immobilized amplicons, the solid supportfurther including a plurality of universal capture primers; (d)extending the plurality of target specific capture primers to produce aplurality of immobilized extension products complementary to the targetpolynucleotides; (e) annealing the plurality of universal captureprimers to the plurality of the immobilized extension products, and (f)amplifying by PCR the plurality of immobilized extension products toproduce a second plurality of immobilized amplicons, wherein thepopulation of immobilized amplicons includes a uniformity of 85% ormore. The method can be used with 10 ng or less input nucleic acid andcan further include sequencing the second plurality of immobilizedamplicons. The method also can be used for determining the presence of agene associated with a disorder or disease, including a cancerassociated gene. Cell free DNA also can be employed in the method.Additionally, detection of a genetic biomarker can include a method forincreasing detection sensitivity of a nucleic acid sequence variant,which includes: (a) contacting a nucleic acid sample including aplurality of target polynucleotides with gene specific forward andreverse primers under conditions sufficient for hybridization, eachspecies of the gene specific forward primer including a unique sequenceindex and an adapter; (b) amplifying by polymerase chain reaction (PCR)the plurality of target polynucleotides to produce a plurality ofamplicons; (c) directly contacting a plurality of target specificcapture primers immobilized on a solid support with the plurality ofamplicons under conditions sufficient for hybridization to produce afirst plurality of immobilized of amplicons, the solid support furtherincluding a plurality of universal capture primers; (d) extending theplurality of target specific capture primers to produce a plurality ofimmobilized extension products complementary to the targetpolynucleotides; (e) annealing the plurality of universal captureprimers to the plurality of the immobilized extension products; (f)amplifying by PCR the plurality of immobilized extension products toproduce a second plurality of immobilized amplicons, wherein the secondplurality of immobilized amplicons includes a uniformity of 85% or more;(g) sequencing the second plurality of immobilized amplicons, and (h)eliminating random sequence errors for one or more target polynucleotideby comparing three or more nucleotide sequences at a variant positionfor a target polynucleotide species, wherein the target polynucleotidespecies are identified by the unique sequence index to thereby determinea true nucleotide sequence variant in the one or more targetpolynucleotides. The method can detect a mismatch rate of 0.3% or lessfor a variant nucleotide position.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,879,312, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods for the selective enrichment of nucleic acids. Some embodimentsinclude the selective enrichment of long nucleic acids comprising atarget nucleic acid. Some embodiments include the selective enrichmentof PCR products. Some embodiments of the methods comprise (a) contactingthe population of nucleic acids with a nickase, thereby producing apopulation of nicked nucleic acids; (b) contacting the population ofnicked nucleic acids with an exonuclease, thereby generating a nucleicacid having a single-stranded portion, wherein the single-strand portioncomprises at least a portion of the target; (c) contacting a captureprobe to the at least a portion of the target, wherein the probehybridizes to the target; and (d) separating a nucleic acid hybridizedto the capture probe from a nucleic acid not bound to the capture probe.In other embodiments, the methods comprise (a) obtaining a population ofnucleic acids, wherein at least some of the nucleic acids in thepopulation comprise a target; (b) contacting the population of nucleicacids with a nickase, thereby producing a population of nicked nucleicacids; (c) contacting the population of nicked nucleic acids with anexonuclease, thereby generating a nucleic acid having a single-strandedportion, wherein the single-strand portion comprises at least a portionof the target; (d) contacting a capture probe to the at least a portionof the target, wherein the probe hybridizes to the target; and (e)separating a nucleic acid hybridized to the capture probe from a nucleicacid not bound to the capture probe. Some embodiments of the methodsalso include a step of releasing the hybridized nucleic acid from thecapture probe. Other embodiments also include amplifying the target.Still further embodiments additionally include sequencing at least aportion of the target. In some embodiments, one or more process steps,for example step (a), can also include contacting the population ofdouble stranded nucleic acids with a type II restriction endonucleasethat includes an isoschizomer of the nickase; and recircularizing thecut double stranded nucleic acids under conditions that favorintramolecular recircularization of individual nucleic acids. In suchembodiments, various type II restriction endonucleases, or combinationsof type II restriction endonucleases, can be used. In some embodiments,for example, the restriction endonuclease includes BbvCI. In otherembodiments, the nickase includes Nb.BbvCI and Nt.BbvCI. In someembodiments of the methods, the probe includes a capture moiety. In somesuch embodiments, the capture moiety includes biotin or streptavidin. Insome embodiments, the step of separating a nucleic acid hybridized tothe capture probe from a nucleic acid not bound to the capture probealso includes contacting the hybridized target and probe to a bindingmoiety. In some embodiments, the binding moiety includes avidin, andstreptavidin. In some embodiments, the binding moiety also includes abead, microsphere or other particle. Embodiments of the methods alsoinclude repeating one or more steps of the process. In certainembodiments, all of the method steps are repeated. In some embodimentsof the methods, the target includes a first capture moiety, and theprobe includes a second capture moiety. Some such embodiments alsoinclude contacting the first capture moiety to a first binding moiety,thereby providing for enrichment of the target, and contacting thesecond capture moiety to a second binding moiety, thereby providing forenrichment of the probe. In addition to the foregoing, some embodimentsof the methods also provide for the selective enrichment of a nucleicacid that comprise the steps of (a) providing a population of nucleicacids, wherein at least some of the nucleic acids in the populationinclude a target hybridized with a capture probe; (b) locking thehybridized probe to the target; and (c) separating a nucleic acid lockedto a probe from a nucleic acid that is not locked to a probe. In otherembodiments, the methods comprise (a) obtaining a population of nucleicacids, wherein at least some of the nucleic acids in the populationinclude a target; (b) hybridizing the target with a capture probe; (c)locking the probe hybridized probe to the target; and (d) separating anucleic acid locked to a probe from a nucleic acid that is not locked toa probe. In addition to the foregoing, some embodiments of the methods,also include methods for selective enrichment of a nucleic acid thatcomprise (a) providing a population of nucleic acids, wherein at leastsome of the nucleic acids in the population include a target thatcomprises a portion of the 5′ end of a nucleic acid and a portion of the3′ end of the nucleic acid, said target being hybridized to a selectorprobe that comprises a first and second oligonucleotide annealedtogether, wherein the first oligonucleotide is complementary to at leasta portion of the 5′ end of the nucleic acid and complementary to atleast a portion of the second oligonucleotide, and the secondoligonucleotide is complementary to at least a portion of the 3′ end ofthe nucleic acid; (b) joining the selector probe to the target; and (c)separating a nucleic acid joined to the selector probe from a nucleicacid not joined to the selector probe. Other embodiments of theenrichment methods comprise the steps of (a) obtaining a population ofnucleic acids, wherein at least some of the nucleic acids in thepopulation include a target, the target including a portion of the 5′end of a nucleic acid and a portion of the 3′ end of the nucleic acid;(b) obtaining a selector probe that comprises a first and secondoligonucleotide annealed together, wherein the first oligonucleotide iscomplementary to at least a portion of the 5′ end of the nucleic acidand complementary to at least a portion of the second oligonucleotide,and the second oligonucleotide is complementary to at least a portion ofthe 3′ end of the nucleic acid; (c) contacting the selector probe to thetarget, wherein the probe hybridizes to the target; (d) joining theselector probe to the target; and (e) separating a nucleic acid joinedto the selector probe from a nucleic acid not joined to the selectorprobe. In addition to the foregoing, some embodiments of the methodsalso include methods for selective enrichment of a nucleic acid thatcomprise (a) providing a population of single-stranded nucleic acids,wherein at least some of the nucleic acids in the population include atarget, the target comprising the 5′ end of a nucleic acid and the 3′end of the nucleic acid, said target being hybridized to a selectorprobe that comprises a first and second oligonucleotide annealedtogether, wherein the first oligonucleotide comprises a 5′ portioncomplementary to the 3′ end of the nucleic acid, a spacer portion, and a3′ portion complementary to the 5′ end of the nucleic acid, the secondoligonucleotide being complementary to the spacer portion; (b) joiningthe selector probe to the target; and (c) separating a nucleic acidjoined to the selector probe from a nucleic acid not joined to theselector probe. Other embodiments of the methods comprise (a) obtaininga population of single-stranded nucleic acids, wherein at least some ofthe nucleic acids in the population include a target, the targetcomprising the 5′ end of a nucleic acid and the 3′ end of the nucleicacid; (b) obtaining a selector probe that includes a first and secondoligonucleotide annealed together, wherein the first oligonucleotidecomprises a 5′ portion complementary to the 3′ end of the nucleic acid,a spacer portion, and a 3′ portion complementary to the 5′ end of thenucleic acid, the second oligonucleotide being complementary to thespacer portion; (c) contacting the selector probe to the target, whereinthe probe hybridizes to the target; (d) joining the selector probe tothe target; and (e) separating a nucleic acid joined to the selectorprobe from a nucleic acid not joined to the selector probe. Someembodiments also include methods for normalizing amplified nucleic acidsthat include selecting a first population of oligonucleotides having aratio of oligonucleotides that includes capture moieties tooligonucleotides lacking capture moieties for a first population ofoligonucleotides; obtaining a second population of oligonucleotides;amplifying target nucleic acids with the first and second populations ofoligonucleotides; and separating amplified targets having incorporatedoligonucleotide comprising capture moieties from amplified targetslacking incorporated oligonucleotide capture moieties. In someembodiments, the step of separating further comprises contacting thehybridized target and probe to a binding moiety. In some embodiments,the binding moiety includes avidin and streptavidin. In someembodiments, the binding moiety also includes a bead, microsphere orother particle.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,828,672, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includereagents for nucleic acid preparation comprising a siderophore. In someembodiments, the siderophore is a bacterial siderophore. In someembodiments, the siderophore is a desferrioxamine B (DFO-B) mesylatesalt. In some embodiments, the reagents may comprise a DNA polymerase.In some embodiments, the reagents may comprise dNTPs. In someembodiments, the reagents do not contain EDTA. Some embodiments providemethods for preparing a nucleic acid library, which comprise: providinga plurality of nucleic acid molecules from a sample; and manipulatingthe plurality of nucleic acid molecules in a reagent for nucleic acidpreparation comprising a siderophore. In some embodiments, manipulatingthe plurality of nucleic acid molecules comprises hybridizing theplurality of nucleic acid molecules to a plurality of oligonucleotideprobes. In some embodiments, the plurality of nucleic acid moleculesand/or the plurality of oligonucleotide probes are immobilized on asupport. In some embodiments, the plurality of nucleic acid moleculesand/or the plurality of oligonucleotide probes are immobilized on thesupport through a binding partner pair to the support. In someembodiments, the support is a magnetic bead. In some embodiments,manipulating the plurality of nucleic acid molecules comprises removingoligonucleotide probes not specifically bound to the plurality ofnucleic acid molecules. In some embodiments, the methods comprisemodifying the oligonucleotide probes specifically bound to the pluralityof nucleic acid molecules. In some embodiments, the methods comprisefragmenting the plurality of nucleic acid molecules. In someembodiments, the methods comprise adding adapters to the plurality ofnucleic acid molecules. In some embodiments, the adapters are added tothe plurality of nucleic acid molecules by amplification. Someembodiments provide methods for reducing oxidative damage to a nucleicacid molecule, which methods comprise preparing the nucleic acidmolecule in the absence of EDTA. In some embodiments, preparing thenucleic acid molecule comprises preparing the nucleic acid molecule inthe presence of a siderophore. In some embodiments, the siderophore is abacterial siderophore. In some embodiments, the siderophore is adesferrioxamine B (DFO-B) mesylate salt. In some embodiments, preparingthe nucleic acid molecule comprises exposing the nucleic acid moleculeto Fe(III). In some embodiments, preparing the nucleic acid moleculecomprises exposing the nucleic acid molecule to a magnetic bead. In someembodiments, the oxidative damage comprises a point mutation in thenucleic acid molecule. In some embodiments, the point mutation is a C toA transversion. Some embodiments provide methods for increasing the Q(phred) score of a sequencing reaction, which methods comprise preparinga nucleic acid molecule in the absence of EDTA. In some embodiments,preparing the nucleic acid molecule comprises preparing the nucleic acidmolecule in the presence of a siderophore. In some embodiments, thesiderophore is a bacterial siderophore. In some embodiments, thesiderophore is a desferrioxamine B (DFO-B) mesylate salt. In someembodiments, the Q score is greater than about 34. In some embodiments,the Q score is greater than about 38. In some embodiments, the Q scoreis greater than about 42. In some embodiments, the sequencing reactionis a deep sequencing application. In some embodiments, the deepsequencing application is cancer-related deep sequencing application. Insome embodiments, the methods comprise sequencing the nucleic acidmolecule in the absence of EDTA. Some embodiments provide kitscomprising at least one container means, wherein the at least onecontainer means comprises a reagent for nucleic acid preparationcomprising a siderophore. In some embodiments, the siderophore is adesferrioxamine B (DFO-B) mesylate salt. In some embodiments, thereagent does not contain EDTA.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,708,655, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includecompositions, systems, and methods for detecting events using tethersanchored to or adjacent to nanopores. In some embodiments, a compositionincludes a nanopore including a first side, a second side, and anaperture extending through the first and second sides; and a permanenttether including a head region, a tail region, and an elongated bodydisposed therebetween. The head region can be anchored to or adjacent tothe first side or second side of the nanopore. The elongated bodyincluding a reporter region can be movable within the apertureresponsive to a first event occurring adjacent to the first side of thenanopore. In one non-limiting example, the head region can be anchoredto a molecule, such as a protein, disposed on the first side or secondside of the nanopore. In some embodiments, a method includes providing ananopore including a first side, a second side, and an apertureextending through the first and second sides; and providing a permanenttether including a head region, a tail region, and an elongated bodydisposed therebetween. The head region can be anchored to or adjacent tothe first or second side of the nanopore, and the elongated body caninclude a reporter region. The method can include moving the reporterwithin the aperture responsive to a first event occurring adjacent tothe first side of the nanopore. In some embodiments, the reporter regionis translationally movable within the aperture responsive to the firstevent. Additionally, or alternatively, the reporter region can berotationally movable within the aperture responsive to the first event.Additionally, or alternatively, the reporter region can beconformationally movable within the aperture responsive to the firstevent. In some embodiments, the head region is anchored to or adjacentto the first side or second side of the nanopore via a covalent bond.The head region can be anchored to the first side of the nanopore. Thetail region can extend freely toward the second side of the nanopore. Insome embodiments, the reporter region is translationally movable towardthe first side of the nanopore responsive to the first event. Thereporter region can be translationally movable toward the second sideafter the first event. The reporter region further can betranslationally movable toward the first side responsive to a secondevent occurring adjacent to the first side of the nanopore, the secondevent being after the first event. The reporter region further can betranslationally movable toward the second side after the second event.In some embodiments, the first event includes adding a first nucleotideto a polynucleotide. In embodiments that include a second event, thesecond event can include adding a second nucleotide to thepolynucleotide. An electrical or flux blockade characteristic of thereporter region can be different than an electrical or flux blockadecharacteristic of another region of the elongated body. In someembodiments, a system can include a composition and measurementcircuitry configured to measure a first current or flux through theaperture or to measure a first optical signal while the reporter regionis moved responsive to the first event.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,670,530, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods of determining a haplotype or partial haplotype of a DNA samplecontaining high molecular weight segments of genomic DNA. Such methodsmay be characterized by the following operations: (a) processing the DNAsample to produce an enriched DNA sample enriched for DNA from a firsthigh molecular weight segment having a plurality of alleles from a firsthaplotype; (b) sequencing DNA in the enriched DNA sample to produce aplurality of sequence reads, which are shorter in length than the firsthigh molecular weight segment, where some of the sequence reads containa first allele of the first haplotype and other of the sequence readscontain a second allele of the first haplotype; (c) aligning thesequence reads to a reference genome to produce aligned reads, wherealigned reads from the first high molecular weight segment tend tocluster into islands on the reference genome; (d) determining distancesseparating adjacent ones of the aligned reads on the reference genome,where the separation distances between adjacent aligned reads fall intoat least two groups distinguishable by the magnitude of their separationdistances; (e) selecting a first group of the aligned reads havingseparation distances to adjacent aligned reads that are smaller than acutoff value, thereby excluding aligned reads having greater separationdistances, where at least a portion of the first group of the alignedreads belong to the same island on the reference genome; and (f) usingalleles from the first group of aligned reads to define a firsthaplotype or first partial haplotype. In some embodiments, the methodshave an additional operation of determining a complete haplotype fromthe first partial haplotype and other partial haplotypes. In someembodiments, selecting a first group of the aligned reads includesdetermining the cutoff value. As an example, the determining the cutoffvalue includes: (i) generating a mixture model from the separationdistances between adjacent aligned reads, wherein the mixture model fitstwo distributions to the separation distances; and (ii) determining thecutoff value from a property of at least one of the two distributions.In some cases, each of the two distributions comprises its own centraltendency (e.g., the mean of a Gaussian distribution). In certainembodiments, selecting a first group of the aligned reads includesdetermining the cutoff value. As an example, the determining the cutoffvalue includes: (i) generating a mixture model from the separationdistances between adjacent aligned reads, wherein the mixture model fitstwo distributions to the separation distances; and (ii) determining thecutoff value from a property of at least one of the two distributions.In some cases, each of the two distributions comprises its own centraltendency (e.g., the mean of a Gaussian distribution). In certainembodiments, generating a mixture model involves applying an expectationmaximization procedure to the separation distances between adjacentaligned reads. In some implementations, determining the cutoff valueincludes an operation of identifying a fraction of the probability massof the distribution containing the shorter separation distances. Forexample, the fraction of the probability mass of the distributioncontaining the shorter separation distances may be about 80% or greater.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,587,273, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of selecting a representational sample of nucleic acidsequences from a complex mixture. The method includes: (a) contacting acomplex mixture of nucleic acids under conditions sufficient forhybridization with a population of capture probes complementary to oneor more nucleic acids comprising a predetermined portion of the sequencecollectively present in the complex mixture to form hybridizationcomplexes of the one or more nucleic acids with the population ofprobes, the population of capture probes being attached to a solidsupport, and (b) removing unhybridized nucleic acids to select arepresentational sample of nucleic acids having a complexity of lessthan 10% but more than 0.001% of the complex mixture, wherein therepresentational sample comprises a nucleic acid copy having aproportion of each sequence in the copy relative to all other sequencesin the copy substantially the same as the proportions of the sequencesin the predetermined portion of one or more nucleic acids within thecomplex mixture. Additionally or alternatively, detection of a geneticbiomarker can include a method of selecting a representational sample ofgenomic sequences from a complete genome. In some embodiments, themethod further provides a nucleic acid population that includes arepresentational sample having a complexity of less than 10% but morethan 0.001% of a complex mixture, the representational sample comprisinga nucleic acid copy having a proportion of each sequence in the copyrelative to all other sequences in the copy substantially the same asthe proportions of sequences in a predetermined portion of a sequencecollectively present in one or more nucleic acids within the complexmixture.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,574,234, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods and compositions for analyzing nucleic acid sequences. In someembodiments, the methods utilize clonal objects, such as DNA balls, thathave been captured on beads. Some embodiments provide compositionshaving a bead and one clonal object, wherein the clonal object isaffinity bound or hybridized to the bead, for example, through patch onthe surface of the bead, such as an affinity binding patch or ahybridization patch. In some aspects, the patch includes a plurality ofpolynucleotides attached to a single region on the surface of the bead.Some embodiments also provide a population of beads having affinitybound or hybridized clonal objects. In particular embodiments, each ofthe clonal objects are affinity bound or hybridized to the beads througha patch on the surface of each bead, such as an affinity binding patchor a hybridization patch. The ratio of beads to bound or hybridizedclonal objects in the population can be 1:1. In particular embodiments,no more than one clonal object is bound or hybridized to any given beadin the population. Using these methods, compositions can be fabricatedwherein a bead and one clonal object are affinity bound or hybridized toeach other through attachment to a patch on the surface of the bead.Some embodiments can provide a method of fabricating an affinity bindingpatch on a bead by providing a bead having a plurality of capturemoieties; providing a solid surface having a plurality ofcapture-complement moieties, wherein the capture-complement moietiesfurther comprise a cleavable moiety and an affinity ligand; specificallybinding the capture moieties to the capture-complement moieties, therebyforming an immobilized bead on the solid surface; and cleaving thecleavable moiety so as to retain the affinity ligand on the bead,thereby fabricating an affinity binding patch on the bead. In particularembodiments, the capture moieties or the capture-complement moieties orboth include capture sequences of polynucleotides. Accordingly, Someembodiments can provide a method of fabricating an affinity bindingpatch on a bead by providing a bead having a plurality of firstpolynucleotides attached to the surface of the bead, wherein the firstpolynucleotides each have a capture sequence; providing a solid surfacehaving a plurality of second polynucleotides attached to the solidsurface, wherein the second polynucleotides each have acapture-complement sequence, a cleavable moiety and an affinity ligand;hybridizing the capture sequences of the first polynucleotides to thecapture-complement sequences of the second polynucleotides, therebyforming an immobilized bead on the solid surface; and cleaving thesecond polynucleotides at the cleavable moiety so as to retain theaffinity ligand on the second plurality of polynucleotides, therebyfabricating an affinity binding patch on the bead. In some aspects, themethod further includes fabricating one clonal object bound to theaffinity binding patch by contacting the affinity ligand with a bindingagent, wherein the binding agent has two or more binding sites, andbinding one clonal object to the binding agent through a second affinityligand on the clonal object, wherein the one clonal object has a singletandemly repeated target nucleic acid molecule, thereby fabricating oneclonal object bound to the affinity binding patch. Some embodiments canprovide a method of fabricating a bead having one clonal object byproviding a bead having a plurality of first capture moieties; providinga solid surface having a plurality of second capture moieties patternedinto patches on the surface, wherein the second capture moieties eachhave a cleavable moiety, wherein one clonal object is bound to one patchon the surface via one or more of the second capture moieties, whereinthe one clonal object has a single tandemly repeated target nucleic acidmolecule; specifically binding the first capture moiety to the clonalobject, thereby forming an immobilized bead on the solid surface, andcleaving the cleavable moiety so as to retain the clonal object, therebyfabricating a bead having one clonal object. In particular embodiments,the capture moieties comprise polynucleotides. Accordingly, Someembodiments can provide a method of fabricating a bead having one clonalobject by providing a bead having a plurality of first polynucleotides;providing a solid surface having a plurality of second polynucleotidespatterned into patches on the surface, wherein the secondpolynucleotides each have a cleavable moiety, wherein one clonal objectis hybridized to one polynucleotide patch on the surface, wherein theone clonal object has a single tandemly repeated target nucleic acidmolecule; hybridizing the first polynucleotides to the clonal object,thereby forming an immobilized bead on the solid surface, and cleavingthe second polynucleotides at the cleavable moiety so as to retain theclonal object, thereby fabricating a bead having one clonal object. Someembodiments can provide a method of fabricating a hybridization patch ona bead by providing a bead having a plurality of first polynucleotidesattached to the surface of the bead, wherein the first polynucleotideseach have a first capture sequence, providing a solid surface having aplurality of second polynucleotides attached to the solid surface,wherein the second polynucleotides each have a first capture-complementsequence and a second capture-complement sequence, hybridizing the firstcapture sequences of the first polynucleotides to the firstcapture-complement sequence of the second polynucleotides, therebyforming an immobilized bead on the solid surface, and extending thefirst polynucleotides of the immobilized bead using the secondcapture-complement sequence as a template, thereby fabricating ahybridization patch of extended first polynucleotides on the bead, theextended first polynucleotides having a second capture sequence. In someaspects, the method further includes fabricating one clonal object boundto the patch on the bead by providing a clonal object having the secondcapture-complement sequence, and hybridizing the secondcapture-complement sequence of the clonal object to the second capturesequences of the bead, thereby fabricating one clonal object bound tothe patch on the bead. In some aspects of the method, extending thefirst polynucleotides includes the addition of one or more nucleosidetriphosphates having an affinity ligand, thereby fabricating an affinitybinding patch on the bead. Some embodiments include methods ofamplifying a target nucleic acid molecule. Some embodiments provide amethod of amplifying a target nucleic acid molecule by placing thecompositions or populations of beads having affinity bound or hybridizedclonal objects onto a solid surface having microwells, wherein only onebead can spatially fit into one microwell and amplifying the targetnucleic acid molecules in the microwells, thereby forming amplicons. Insome aspects, the method further includes sequencing the amplifiedtarget nucleic acid molecules using methods such as sequencing bysynthesis, sequencing by ligation or sequencing by hybridization,thereby determining the nucleic acid sequence of the target nucleic acidmolecule.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,453,258, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includedetermination of a sequence, for example a nucleic acid sequence, usinga minimal dye set, minimal excitation light sources, and minimal opticalemission filters while still allowing for differentiation of theincorporation of all four nucleotides in a sequencing reaction.Additionally or alternatively, detection of a genetic biomarker caninclude methods for determining the sequence of a polynucleotidecomprising detecting in a sequencing reaction the incorporation of threedifferent types of detectable nucleotide conjugates into apolynucleotide and determining the incorporation of a fourth type ofnucleotide based on the detection pattern of the three different typesof detectable nucleotides into the polynucleotide thereby determiningthe sequence of a polynucleotide, wherein the incorporation of threedifferent types of detectable nucleotide conjugates is detected from asignal state and wherein the incorporation of the fourth type ofnucleotide is determined from a dark state. Additionally oralternatively, detection of a genetic biomarker can include methods fordetermining the sequence of a polynucleotide comprising applying to apolynucleotide sample for sequencing a solution comprising four modifiednucleotide types wherein three modified nucleotide types are conjugatedto one or more detection moieties and one or more linkers positionedbetween the nucleotide and the one or more detection moieties, andwherein a fourth nucleotide type lacks a detection moiety, detecting apattern of incorporation of said modified nucleotides in a sequencingreaction thereby capturing a first detectable pattern, applying one ormore compositions to the sequencing reaction thereby changing the firstdetectable pattern, detecting a second detectable pattern, anddetermining the sequence of the polynucleotide sample based on thedetectable patterns. In some embodiments, the polynucleotide forsequencing comprises one or more of deoxyribonucleic acids, modifieddeoxyribonucleic acids, ribonucleic acids and modified ribonucleicacids. In some embodiments, the polynucleotide for sequencing is agenomic DNA library preparation. In some embodiments, the nucleotideconjugate comprises nucleotide types selected from the group consistingof dATP, dTTP, dUTP, dCTP, dGTP or non-natural nucleotide analogsthereof. In some embodiments, the non-natural nucleotide analogcomprises a reversible terminator moiety and is selected from the groupconsisting of rbATP, rbTTP, rbCTP, rbUTP and rbGTP. In some embodiments,the nucleotide incorporation is sequence by synthesis, sequence byligation, and sequence by hybridization or a combination thereof. Insome embodiments, the three nucleotide type conjugates are detected bydetecting a fluorescent moiety. In some embodiments, the fluorescentmoiety is the same for the three nucleotide conjugates whereas in otherembodiments the fluorescent moiety is one or more different fluorescentmoieties. In some embodiments, the one or more different fluorescentmoieties are detected by the same emission filter. In some embodiments,the fluorescent moiety comprises a fluorescent resonance energy transfersystem moiety. In some embodiments, the incorporation of the fourthnucleotide is determined by lack of detection. In some embodiments, thedetectable nucleic acid conjugates are detected by fluorescence. In someembodiments, the fluorescence is detected by a first and a secondimaging event, in further embodiments the first and second imagingevents are separated in time. In some embodiments, the first imagingevent detects a pattern of fluorescence that is different from thepattern of fluorescence detected by the second imaging event. In someembodiments, the incorporation of one or more nucleotides is determinedby the difference in the pattern of fluorescence between the first andsecond imaging events. In some embodiments, the one or more nucleotidetype conjugates further comprise one or more linker sequences, infurther embodiments the one or more linker sequences comprise one ormore of a cleavable linker and a spacer linker. In some embodiments, thecleavable linker comprises one or more cleavable linkage groups selectedfrom the group consisting of a disulfide, a diol, a diazo, an ester, asulfone, an azide, an alyl and a silyl ether, whereas in preferredembodiments the cleavable linkage group is a disulfide. In someembodiments, the spacer linker is one or more of polyethylene glycol orconcatamers thereof and2-{2-[3-(2-amino-ethylcarbornyl)-phenoxy]-1-azido-ethoxy}-ethoxy-aceticacid. In some embodiments, the one or more spacer linkers furthercomprise one or more cleavable linkage groups wherein the cleavablelinkage group is selected from the group consisting of a disulfide, adiol, a diazo, an ester, a sulfone, an azide, an alyl and a silyl ether.In some embodiments, the spacer linker is polyethylene glycol orconcatamers thereof whereas in other embodiments the spacer linker is2-{2-[3-(2-amino-ethylcarbornyl)-phenoxy]-1-azido-ethoxy}-ethoxy-aceticacid. In some embodiments, the one or more nucleotide conjugatescomprise a polyethylene glycol linker and a2-{2-[3-(2-amino-ethylcarbornyl)-phenoxy]-1-azido-ethoxy}-ethoxy-aceticacid linker which may or may not further comprise a hapten and afluorescent moiety. In some embodiments, the hapten is selected from thegroup consisting of biotin, digoxigenin and dinitrophenol. In someembodiments, the one or more nucleotide conjugates comprises astreptavidin-fluorescent moiety conjugate whereas in other embodiments,the one or more nucleotide conjugates comprises an anti-haptenantibody-fluorescent moiety conjugate selected from the group consistingof anti-digoxigenin and anti-dinitrophenol. In some embodiments thenucleotide conjugate comprising a polyethylene glycol linker and a2-{2-[3-(2-amino-ethylcarbornyl)-phenoxy]-1-azido-ethoxy}-ethoxy-aceticacid linker further comprises two fluorescent moieties. In someembodiments, the two fluorescent moieties constitute a fluorescenceresonance energy transfer system.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,441,267, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods of determining the identity of a nucleotide at a detectionposition in a target sequence. The methods comprise providing ahybridization complex comprising the target sequence and a capture probecovalently attached to a microsphere on a surface of a substrate. Themethods comprise determining the nucleotide at the detection position.The hybridization complex can comprise the capture probe, a captureextender probe, and the target sequence. In addition, the targetsequence may comprise exogeneous adapter sequences. In an additionalaspect, the method comprises contacting the micro spheres with aplurality of detection probes each comprising a unique nucleotide at thereadout position and a unique detectable label. The signal from at leastone of the detectable labels is detected to identify the nucleotide atthe detection position. In an additional aspect, the detection probedoes not contain detection label, but rather is identified based on itscharacteristic mass, for example via mass spectrometry. In addition, thedetection probe comprises a unique label that is detected based on itscharacteristic mass. Additionally or alternatively, detection of agenetic biomarker can include methods wherein the target sequencecomprises a first target domain directly 5′ adjacent to the detectionposition. The hybridization complex comprises the target sequence, acapture probe and an extension primer hybridized to the first targetdomain of the target sequence. The determination step comprisescontacting the micro spheres with a polymerase enzyme, and a pluralityof NTPs each comprising a covalently attached detectable label, underconditions whereby if one of the NTPs basepairs with the base at thedetection position, the extension primer is extended by the enzyme toincorporate the label. As is known to those in the art, dNTPs and ddNTPsare the preferred substrates for DNA polymerases. NTPs are the preferredsubstrates for RNA polymerases. The base at the detection position isthen identified. Additionally or alternatively, detection of a geneticbiomarker can include methods wherein the target sequence comprises afirst target domain directly 5′ adjacent to the detection position,wherein the capture probe serves as an extension primer and ishybridized to the first target domain of the target sequence. Thedetermination step comprises contacting the micro spheres with apolymerase enzyme, and a plurality of NTPs each comprising a covalentlyattached detectable label, under conditions whereby if one of the NTPsbasepairs with the base at the detection position, the extension primeris extended by the enzyme to incorporate the label. The base at thedetection position is thus identified. Additionally or alternatively,detection of a genetic biomarker can include methods wherein the targetsequence comprises (5′ to 3′), a first target domain comprising anoverlap domain comprising at least a nucleotide in the detectionposition and a second target domain contiguous with the detectionposition. The hybridization complex comprises a first probe hybridizedto the first target domain, and a second probe hybridized to the secondtarget domain. The second probe comprises a detection sequence that doesnot hybridize with the target sequence, and a detectable label. If thesecond probe comprises a base that is perfectly complementary to thedetection position a cleavage structure is formed. The method furthercomprises contacting the hybridization complex with a cleavage enzymethat will cleave the detection sequence from the signalling probe andthen forming an assay complex with the detection sequence, a captureprobe covalently attached to a microsphere on a surface of a substrate,and at least one label. The base at the detection position is thusidentified.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,060,431, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includearray compositions comprising a substrate with a surface comprisingdiscrete sites. The composition can further comprise a population ofmicrospheres comprising at least a first and a second subpopulation;each subpopulation comprises a bioactive agent; and an identifierbinding ligand that will bind a decoder binding ligand such that theidentity of the bioactive agent can be elucidated. The microspheres aredistributed on the surface. Some embodiments provide array compositionscomprising a substrate with a surface comprising discrete sites, and apopulation of microspheres comprising at least a first and a secondsubpopulation. Each subpopulation comprises a bioactive agent and doesnot comprise an optical signature. Some embodiments provide methods ofmaking an array composition as outlined above. The methods compriseforming a surface comprising individual sites on a substrate anddistributing microspheres on said surface such that said individualsites contain microspheres. The microspheres comprise at least a firstand a second subpopulation each comprising a bioactive agent and do notcomprise an optical signature. Some embodiments provide methods ofmaking a composition comprising forming a surface comprising individualsites on a substrate and distributing microspheres on the surface suchthat the individual sites contain microspheres. The microspherescomprise at least a first and a second subpopulation each comprising abioactive agent and an identifier binding ligand that will bind adecoder binding ligand such that the identification of the bioactiveagent can be elucidated. Additionally or alternatively, detection of agenetic biomarker can include methods of decoding an array compositioncomprising providing an array composition as outlined above, and addinga plurality of decoding binding ligands to the array composition toidentify the location of at least a plurality of the bioactive agents.Additionally or alternatively, detection of a genetic biomarker caninclude methods of determining the presence of a target analyte in asample. The methods comprise contacting the sample with an arraycomposition as outlined above, and determining the presence or absenceof the target analyte. Additionally or alternatively, detection of agenetic biomarker can include a method comprising providing an arraycomposition comprising a population of microspheres comprising at leasta first and a second subpopulation, wherein each subpopulation comprisesa bioactive agent and at least a first and a second decoding attribute,and detecting each of said first and second decoding attributes toidentify each of said bioactive agents. Additionally or alternatively,detection of a genetic biomarker can include a method of increasing theinformation obtained in a decoding step. The method includes the use ofdegenerate probes as DBL-IBL combinations. Additionally oralternatively, detection of a genetic biomarker can include the use ofmultiple decoding attributes on a bead. Additionally or alternatively,detection of a genetic biomarker can include a method of increasing theconfidence of decoding. The method includes using the decoding as aquality control measure. Additionally or alternatively, detection of agenetic biomarker can include a method of decoding an array compositioncomprising providing an array composition comprising a population ofmicrospheres comprising at least 50 subpopulations, wherein eachsubpopulation comprises a bioactive agent adding a plurality of decodingbinding ligands to said population of microspheres to identify at least50 of the bioactive agents. Additionally or alternatively, detection ofa genetic biomarker can include a method of determining the presence ofa target analyte in a sample comprising contacting said sample with acomposition comprising a population of microspheres comprising at least50 subpopulations, wherein each subpopulation comprises a bioactiveagent adding a plurality of decoding binding ligands to said populationof microspheres to identify at least 50 of the bioactive agents anddetermining the presence or absence of said target analyte.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,060,431, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemicrofluidic devices for the detection of a target analyte in a sample.The devices comprise a solid support that has any number of modules,including a sample inlet port and at least one sample handling wellcomprising a well inlet port and a well outlet port. The devicegenerally further comprises a first microchannel to allow fluid contactbetween the sample inlet port and the sample handling well. The devicealso comprises a detection module comprising a substrate with a surfacecomprising discrete sites, and a population of microspheres comprisingat least a first and a second subpopulation, wherein each subpopulationcomprises a bioactive agent. The microspheres are distributed on saidsurface. The detection module also comprises a detection inlet port toreceive the sample. The device also comprises a second microchannel toallow fluid contact between the sample handling well and the detectioninlet port. Additionally or alternatively, detection of a geneticbiomarker can include a method of assembling a detector in amicrofluidic device. The method includes providing a microfluidic devicecomprising a first microchannel to allow fluid contact between a sampleinlet port and a sample handling well, a second microchannel to allowfluid contact between said sample handling well and a detection inletport, and a detection module comprising a substrate with a surfacecomprising discrete sites. The method further includes flowing a fluidacross the substrate. The fluid comprises a population of microspherescomprising at least a first and a second subpopulation, wherein eachsubpopulation comprises a bioactive agent, whereby the beads flow acrossthe discrete sites, and are deposited randomly in the discrete sites.The method additionally includes reversing the flow of the fluid.Additionally or alternatively, detection of a genetic biomarker caninclude a method of assembling a detector in a microfluidic device. Themethod includes providing a microfluidic device comprising a pluralityof first micro channels, and a population of microspheres inmicrochannels. The device further includes a receiving chamber connectedto said microchannels. The method further includes flowing saidmicrospheres through said microchannels into said receiving chamber.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,222,134, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods, systems and compositions for detecting molecules. Inparticular, methods, systems and compositions for detecting multipletypes of molecules on a solid support. Some embodiments relate tomethods for detecting molecules. In some embodiments, such methodscomprise the steps of (a) providing a solid support comprising moleculesassociated with a site on the solid support such that the molecules aredetected in aggregate during a detection step, wherein the sitecomprises at least two different types of molecules; (b) detecting asignal corresponding to the aggregate of molecules at the site; (c)estimating the fraction of different types of molecules at the site orestimating the amount of signal corresponding to different types ofmolecules at the site; (d) calculating the amount of signalcorresponding to different types of molecules at the site using thefraction estimate, thereby obtaining a signal estimate or calculatingthe fraction of different types of molecules at the site using thesignal estimate, thereby obtaining a fraction estimate; and (e)iteratively updating the fraction estimate and signal estimate until theestimates converge, thereby detecting molecules associated with thesite. In some embodiments of the above-described methods, the providingstep further comprises providing a mixture of molecules to the solidsupport. In other embodiments, the providing step further comprisesassociating the molecules with the site. In still other embodiments, theproviding step further comprises attaching the molecules at the site. Insome embodiments of the above-described methods, the estimating step isperformed by guessing the fraction of different types of molecules atthe site or guessing the amount of signal corresponding to differenttypes of molecules at the site. In other embodiments, the estimatingstep comprises performing a principal component analysis (PCA). In someembodiments of the above-described methods, the updating step comprisesperforming a numerical optimization algorithm. In some such methods, thenumerical optimization algorithm is based on an iterative map search. Insome such embodiments, the numerical optimization algorithm is based onFienup's iteration map. In some embodiments of the above-describedmethods, sequence data is obtained for one or more molecules. In somesuch methods, sequence data is obtained by a sequencing-by-synthesisprocess. In certain embodiments, the sequencing-by-synthesis processcomprises a pyrosequencing process. In some embodiments of theabove-described methods, the solid support comprises a bead. In someother embodiments, the solid support comprises a flow-cell. In preferredembodiments of the above-described methods, the molecules comprisenucleic acids. In some such methods, the nucleic acids are attached atthe site. In some embodiments, the nucleic acids comprise a firstsubpopulation of nucleic acids and a second subpopulation of nucleicacids, wherein the nucleic acids of the first subpopulation each have anidentical target region and the nucleic acids of the secondsubpopulation each have an identical region that is a variant of thetarget region. In some embodiments, the nucleotide sequence of thetarget region of the nucleic acids of the first subpopulation has atleast 1 nucleotide that is different as compared to the nucleotidesequence of the variant of the target region of the nucleic acids of thesecond subpopulation. In some embodiments, the nucleotide sequence ofthe target region of the nucleic acids of the first subpopulation has atleast 3 nucleotides that are different as compared to the nucleotidesequence of the variant of the target region of the nucleic acids of thesecond subpopulation. In some embodiments, a nucleotide sequencedifference between the target region in the nucleic acids of the firstsubpopulation and the variant of the target region in the nucleic acidsof the second subpopulation comprises at least one difference selectedfrom the group consisting of a mutation, a polymorphism, an insertion, adeletion, a substitution, a simple tandem repeat polymorphism, and asingle nucleotide polymorphism (SNP). In some embodiments, the nucleicacids comprise alleles of a genetic locus from a polyploid organism. Insome other embodiments, the nucleic acids comprise alternative splicingforms of a nucleic acid. In yet other embodiments, the nucleic acidscomprise alleles of a genetic locus from a diploid organism. Alsodescribed are molecule detection systems. The molecule detection systemscan comprise a solid support comprising molecules associated with a siteon the solid support such that the molecules are detected in aggregate,wherein the molecules comprise at least two different types ofmolecules, and a detector configured to detect the molecules associatedwith the site. In some embodiments, the molecules are attached at thesite. In a preferred embodiment, the molecules comprise nucleic acids.In some embodiments of the molecule detection systems, a site comprisesabout 2 to about 1011 molecules, about 2 to about 1010 molecules, about2 to about 109 molecules, about 2 to about 108 molecules, about 2 toabout 107 molecules, about 2 to about 106 molecules, about 2 to about105 molecules, about 2 to about 104 molecules. In some embodiments, themolecules are associated with the site. In other embodiments, themolecules are attached at the site. In certain embodiments, themolecules comprise nucleic acids. Some embodiments of theabove-described molecule detection systems can further comprise a fluidhandling system configured to apply fluid to the site. Other embodimentsof the above-described molecule detection systems can further comprise alight source configured to provide an excitation beam to the site. Someembodiments of the above-described molecule detection systems canfurther comprise a first data processing module configured to estimatethe fraction of different types of molecules at the site or the amountof signal corresponding to different types of molecules at the site. Insome embodiments, the first data processing module is also used fordetermining the variation associated with the estimate. In otherembodiments, the determining step is performed using a separate dataprocessing module. In some embodiments of such systems, the systems canfurther comprise a second data processing module configured to calculatethe amount of signal corresponding to different types of molecules atthe site using the fraction estimate or to calculate the fraction ofdifferent types of molecules at the site using the signal estimate. Inother embodiments of such systems, the systems can further comprise athird data processing module configured to iteratively update thefraction estimate and signal estimate. In some embodiments of theabove-described molecule detection systems, the systems are configuredto identify the nucleotide sequence of a target region of a nucleicacid. Additionally or alternatively, detection of a genetic biomarkercan include methods of identifying a target region of a nucleic acid.The methods can comprise (a) associating a first subpopulation ofnucleic acids with a site on a solid support, wherein nucleic acids ofthe first subpopulation comprise an identical target region; (b)associating a second subpopulation of nucleic acids with the site on thesolid support, wherein nucleic acids of the second subpopulationcomprise an identical target region that is a variant of the targetregion of the nucleic acids of the first subpopulation; (c) detecting asignal corresponding to one or more nucleotides of the target region offirst subpopulation nucleic acids and one or more nucleotides of thevariant of the target region of second subpopulation nucleic acids; (d)estimating the fraction of first subpopulation nucleic acids and secondsubpopulation nucleic acids associated with the site or estimating theamount of signal corresponding to first subpopulation nucleic acids andsecond subpopulation nucleic acids associated with the site; (e)calculating the amount of signal corresponding to first subpopulationnucleic acids and second subpopulation nucleic acids associated with thesite using the fraction estimate, or calculating the fraction of firstsubpopulation nucleic acids and second subpopulation nucleic acidsassociated with the site using the signal estimate; and (f) iterativelyupdating the fraction estimate and signal estimate until the estimatesconverge, thereby identifying a target region of a nucleic acid. In someembodiments of the above-described methods, step (a) comprises attachingfirst subpopulation nucleic acids and second subpopulation nucleic acidsto the solid support. In some embodiments of the above-describedmethods, step (d) comprises performing a principal component analysis(PCA). In some embodiments of the above-described methods, step (f)comprises performing a numerical optimization algorithm. In some suchembodiments, the numerical optimization algorithm is based on iterativemap search. In some other embodiments, the numerical optimizationalgorithm is based on Fienup's iteration map. In some embodiments of theabove-described methods, sequence data is obtained from both first andsecond subpopulation nucleic acids. In some such embodiments, sequencedata is obtained by a sequencing-by-synthesis process. In someembodiments, the sequencing-by-synthesis process comprises apyrosequencing process. Additionally or alternatively, detection of agenetic biomarker can include methods for identifying a biosignature.The methods can comprise the steps of (a) providing samples obtainedfrom a plurality of subjects, wherein the samples comprise molecules;(b) tagging molecules from the samples so as to identify the subjectfrom which each sample originated; (c) associating molecules from thesamples with a site on a solid support such that the molecules aredetected in aggregate during a detection step, wherein the sitecomprises at least two different types of molecules; (d) obtaining abiosignature for molecules associated with the site by: i) detecting asignal corresponding to the aggregate of the molecules at the site, ii)estimating the fraction of different types of molecules at the site orthe amount of signal corresponding to different types of molecules atthe site, iii) calculating the amount of signal corresponding todifferent types of molecules at the site using the fraction estimate, orcalculating the fraction of different types of molecules at the siteusing the signal estimate, and iv) iteratively updating the fractionestimate and signal estimate until the estimates converge, therebyobtaining a biosignature for molecules at the site; and (e) comparingthe biosignature obtained in step (d) to a reference biosignature,thereby identifying the biosignature. In a preferred embodiment, themolecules are attached at the site. In a preferred embodiment of theabove-described methods, the molecules comprise nucleic acids. In somesuch embodiments, the nucleic acids comprise a marker from a pathogen.In certain embodiments, the pathogen comprises a pathogen selected fromthe group consisting of a virus, a bacterium and a eukaryotic cell. Insome embodiments, the eukaryotic cell can be a cancer cell. In someembodiments of the above-described methods, the sample comprises anabnormal cell type. In some embodiments of the above-described methods,the sample is obtained from a cancer patient. Also described is a solidsupport including a population of nucleic acids associated with a siteon the solid support such that nucleic acids of the population ofnucleic acids are detected in aggregate, the population of nucleic acidscomprising a first subpopulation and a second subpopulation, whereinnucleic acids of the first subpopulation comprise an identical targetregion and nucleic acids of the second subpopulation comprise anidentical region that is a variant of the target region. Additionally oralternatively, detection of a genetic biomarker can include beadscomprising a first subpopulation of capture nucleic acids having acompetitor molecule hybridized thereto and a second subpopulation ofcapture nucleic acids comprising a region that permits hybridization ofa complementary molecule. Additionally or alternatively, detection of agenetic biomarker can include beads comprising capture nucleic acidshybridized with an amplified nucleic acid comprising a degenerate tag,the degenerate tag being hybridized to a capture nucleic acid. In someembodiments, the bead is present in a channel of a substrate. In otherembodiments, the bead is present in a well of a multiwell substrate. Ina preferred embodiment, the well is configured to hold a single beadhaving the amplified nucleic acids hybridized thereto.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,163,283, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includecompositions comprising a plurality of nucleic acids, each nucleic acidcomprising an invariant sequence, a variable sequence and a label.Additionally or alternatively, detection of a genetic biomarker caninclude a method for decoding an array composition. The method includesproviding an array composition comprising a substrate with a surfacecomprising discrete sites and a population of microspheres comprisingfirst and second subpopulations, each subpopulation comprising anidentifier nucleic acid sequence comprising a primer sequence and adecoder sequence. The method further comprises adding to the array afirst set of combinatorial decoding probes comprising a primingsequence, at least one decoding nucleotide and a label, and detectingthe presence of the label.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,045,796, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of detecting one or several typable loci contained within agiven genome, where the method includes the steps of providing anamplified representative population of genome fragments having suchtypable loci, contacting the genome fragments with a plurality ofnucleic acid probes having sequences corresponding to the typable lociunder conditions wherein probe-fragment hybrids are formed; anddetecting typable loci of the probe-fragment hybrids. In particularembodiments these nucleic acid probes are at most 125 nucleotides inlength. However, probes having any of a variety of lengths or sequencescan be used as set forth in more detail below. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetecting typable loci of a genome including the steps of providing anamplified representative population of genome fragments that has suchtypable loci, contacting the genome fragments with a plurality ofnucleic acid probes having sequences corresponding to the typable lociunder conditions wherein probe-fragment hybrids are formed; and directlydetecting typable loci of the probe-fragment hybrids. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetecting typable loci of a genome including the steps of providing anamplified representative population of genome fragments having thetypable loci; contacting the genome fragments with a plurality ofimmobilized nucleic acid probes having sequences corresponding to thetypable loci under conditions wherein immobilized probe-fragment hybridsare formed; modifying the immobilized probe-fragment hybrids; anddetecting a probe or fragment that has been modified, thereby detectingthe typable loci of the genome. Additionally or alternatively, detectionof a genetic biomarker can include a method, including the steps of (a)providing a plurality of genome fragments, wherein the plurality ofgenome fragments has at least 100 ug of DNA having a complexity of atleast 1 Gigabases; (b) contacting the plurality of genome fragments witha plurality of different immobilized nucleic acid probes, wherein atleast 500 of the different nucleic acid probes hybridize with genomefragments to form probe-fragment hybrids; and (c) detecting typable lociof the probe-fragment hybrids. Additionally or alternatively, detectionof a genetic biomarker can include a method can also include the stepsof (a) providing a plurality of genome fragments, wherein the pluralityof genome fragments has a concentration of at least 1 ug/ul of DNAhaving a complexity of at least 1 Gigabases; (b) contacting theplurality of genome fragments with a plurality of different immobilizednucleic acid probes, wherein at least 500 of the different nucleic acidprobes hybridize with genome fragments to form probe-fragment hybrids;and (c) detecting typable loci of the probe-fragment hybrids.Additionally or alternatively, detection of a genetic biomarker caninclude a method of amplifying genomic DNA, including the steps ofproviding isolated double stranded genomic DNA, producing nicked DNA bycontacting the double stranded genomic DNA with a nicking agent,contacting this nicked DNA with a strand displacing polymerase and aplurality of primers, so as to amplify the genomic DNA. Additionally oralternatively, detection of a genetic biomarker can include a method fordetecting typable loci of a genome. The method includes the steps of (a)in vitro transcribing a plurality of amplified gDNA fragments, therebyobtaining genomic RNA (gRNA) fragments; (b) hybridizing the gRNAfragments with a plurality of nucleic acid probes having sequencescorresponding to the typable loci; and (c) detecting typable loci of thegRNA fragments that hybridize to the probes. Additionally oralternatively, detection of a genetic biomarker can include a method ofproducing a reduced complexity, locus-specific, amplified representativepopulation of genome fragments. The method includes the steps of (a)replicating a native genome with a plurality of random primers, therebyproducing an amplified representative population of genome fragments;(b) replicating a sub-population of the amplified representativepopulation of genome fragments with a plurality of differentlocus-specific primers, thereby producing a locus-specific, amplifiedrepresentative population of genome fragments; and (c) isolating thesub-population, thereby producing a reduced complexity, locus-specific,amplified representative population of genome fragments. Additionally oralternatively, detection of a genetic biomarker can include a method forinhibiting ectopic extension of probes in a primer extension assay. Themethod includes the steps of (a) contacting a plurality of probe nucleicacids with a plurality of target nucleic acids under conditions whereinprobe-target hybrids are formed; (b) contacting the plurality of probenucleic acids with an ectopic extension inhibitor under conditionswherein probe-ectopic extension inhibitor hybrids are formed; and (c)selectively modifying probes in the probe-target hybrids compared toprobes in the probe-ectopic extension inhibitor hybrids. Additionally oralternatively, detection of a genetic biomarker can include a methodincluding the steps of (a) contacting a plurality of genome fragmentswith a plurality of different immobilized nucleic acid probes underconditions wherein immobilized probe-fragment hybrids are formed; (b)modifying the immobilized probes while hybridized to the genomefragments, thereby forming modified immobilized probes; (c) removingsaid genome fragments from said probe-fragment hybrids; and (d)detecting the modified immobilized probes after removing the genomefragments, thereby detecting typable loci of the genome fragments.Additionally or alternatively, detection of a genetic biomarker caninclude a method including the steps of (a) representationallyamplifying a native genome, wherein an amplified representativepopulation of genome fragments having the typable loci is produced underisothermal conditions; (b) contacting the genome fragments with aplurality of nucleic acid probes having sequences corresponding to thetypable loci under conditions wherein probe-fragment hybrids are formed;and (c) detecting typable loci of the probe-fragment hybrids.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,765,419, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods and systems for detection of pyrophosphate, which can either beused alone or in connection with other technologies, such aspyrosequencing. In some embodiments, such methods and systems permit thedetection of pyrophosphate with reduced background. Additionally oralternatively, detection of a genetic biomarker can include methods andsystems for pyrosequencing of nucleic acids with reduced background.Additionally or alternatively, detection of a genetic biomarker caninclude methods of delaying pyrophosphate detection. The methods caninclude the steps of providing a pyrophosphate sequestering agent,generating pyrophosphate in the presence of the sequestering agent,whereby the pyrophosphate is reversibly sequestered, releasing thepyrophosphate from the sequestering agent and detecting thepyrophosphate. Additionally or alternatively, detection of a geneticbiomarker can include methods for sequencing a nucleic acid. The methodcan include the steps of providing nucleotides or nucleotide analogs inthe presence of a pyrophosphate sequestering agent and a pyrophosphatedetecting agent, incorporating one or more of the nucleotides ornucleotide analogs into a polynucleotide so as to extend thepolynucleotide in the presence of the pyrophosphate sequestering agent,thereby generating sequestered pyrophosphate, removing theunincorporated nucleotides or nucleotide analogs from the presence ofthe pyrophosphate detecting agent, releasing the pyrophosphate from thesequestering agent in the presence of the pyrophosphate detecting agentand detecting released pyrophosphate, wherein released pyrophosphateindicates that one or more nucleotides or nucleotide analogs have beenincorporated into the polynucleotide. Additionally or alternatively,detection of a genetic biomarker can include methods of modulating theavailability of free pyrophosphate during sequencing of a nucleic acidmolecule. These methods can include the steps of combining nucleotidesor nucleotide analogs with a nucleic acid template; incubating thenucleic acid template and the nucleotides or nucleotide analogs togetherwith a polymerase and a pyrophosphate sequestering agent underconditions sufficient to form a polynucleotide complementary to all or aportion of the nucleic acid template, wherein pyrophosphate generatedduring the incubating is reversibly sequestered by the sequesteringagent, removing from the nucleic acid template the nucleotides ornucleotide analogs that have not been incorporated into thepolynucleotide and releasing the pyrophosphate from the pyrophosphatesequestering agent by providing a release reagent. Additionally oralternatively, detection of a genetic biomarker can include arrayscomprising a solid support having a plurality of sites distributedthereon, wherein at least a portion of the sites comprise a templatenucleic acid and a pyrophosphate sequestering agent capable ofreversibly sequestering pyrophosphate. In certain aspects, the sitescomprise wells. In certain aspects, the template nucleic acid isattached to a particle or bead within the wells. In certain aspects, thewells further comprise beads having a pyrophosphate detecting agentattached thereto. In certain aspects, the pyrophosphate sequesteringagent is disposed between the template nucleic acid and thepyrophosphate detecting agent. In certain aspects, the pyrophosphatedetecting agent comprises ATP sulfurylase and luciferase. In certainaspects, the wells further comprise packing beads. In some embodiments,pyrophosphate is reversibly sequestered by adsorption with thesequestering agent. In certain aspects, the sequestering agent comprisesa cationic agent capable of sequestering pyrophosphate throughchelation, complexation, or adsorption. In certain aspects, the cationicagent comprises an agent selected from the group consisting of a metal,metal salt, a metal oxide or other agent set forth below. In certainaspects, the metal or metal oxide comprises Ti or TiO2. In otheraspects, the pyrophosphate sequestering agent comprises hydroxyapatite.In other aspects the sequestering agent comprises an ammonium orsubstituted ammonium salt, or a resin or bead that contains such groups.In certain aspects of the above embodiments, the pyrophosphatesequestering agent comprises particles or beads. In addition to theforegoing, in some embodiments of the methods and arrays, pyrophosphatecan be released from the sequestering agent by providing a releasereagent to the sequestering agent. In certain aspects, the releasereagent comprises an anion capable of displacing the pyrophosphate fromthe sequestering agent, for example, by preferentially complexing orchelating the cation of the sequestering agent. In certain aspects, therelease reagent comprises an agent selected from the group consisting ofan acid or salt of an acid such as oxalic acid, an oxalate salt,sulfamic acid, a sulfamate salt, ethylene diamine tetraacetic acid(EDTA), ethylene glycol-bis-β-amino-ethyl ether N,N,N′,N′-tetra-aceticacid (EGTA) citric acid, tartaric acid, acetic or other carboxylic acidsor their salts. In other aspects, the release reagent comprisesphosphate. In other aspects, the release reagent comprises abisphosphonate. In certain aspects, the release reagent is the enzymeATP sulfurylase. In this particular aspect, the ATP sulfurylase is insolution rather than being bound to a bead or other surface. The ATPsulfurylase can release the pyrophosphate from the sequestering agent bytransforming the pyrophosphate into ATP in the presence of adenysinephosphosulfate (APS). Typically, the ATP will have a lower bindingaffinity for the sequestering agent than does pyrophosphate. In someaspects, arrays can include sites that further comprise a polymerase andnucleotides or nucleotide analogs. In some embodiments, arrays canfurther comprises at least one electrode capable of producing anelectric field in the presence of the sites. Additionally oralternatively, detection of a genetic biomarker can include methods ofmaking an array. The methods can include the steps of providing a solidsupport having a plurality of sites distributed thereon and providing atemplate nucleic acid and a pyrophosphate sequestering agent capable ofreversibly sequestering pyrophosphate to at least a portion of thesites. In certain aspects, the step of providing the template nucleicacid to the plurality of sites occurs prior to providing thepyrophosphate sequestering agent. In certain aspects, the step ofproviding the template nucleic acid to the plurality of sites occurssubsequent to providing the pyrophosphate sequestering agent. In certainaspects, the step of providing the template nucleic acid to theplurality of sites occurs at the same time as providing thepyrophosphate sequestering agent. In some embodiments, arraysmanufactured according to the methods above can be employed in thesequencing and/or pyrophosphate sequestering and release processes. Forexample, in some embodiments, the pyrophosphate is reversiblysequestered by adsorption with the sequestering agent. In certainaspects, the sequestering agent comprises a cationic agent capable ofsequestering pyrophosphate through chelation, complexation, oradsorption. In certain aspects, the cationic agent comprises an agentselected from the group consisting of a metal, metal salt, a metal oxideor other agent set forth below. In certain aspects, the metal or metaloxide comprises Ti or TiO2. In other aspects, the pyrophosphatesequestering agent comprises hydroxyapatite. In other aspects, thesequestering agent comprises an ammonium or substituted ammonium salt,or a resin or bead that contains such groups. In certain aspects of theabove embodiments, the pyrophosphate sequestering agent comprisesparticles or beads. In addition to the foregoing, in some embodiments,arrays manufactured according to the above methods can be utilized inprocesses in which pyrophosphate can be released from the sequesteringagent by providing a release reagent to the sequestering agent. Incertain aspects, the release reagent comprises an anion capable ofdisplacing the pyrophosphate from the sequestering agent, for example,by preferentially complexing or chelating the cation of the sequesteringagent. In certain aspects, the release reagent comprises an agentselected from the group consisting of an acid or salt of an acid such asoxalic acid, an oxalate salt, sulfamic acid, a sulfamate salt, ethylenediamine tetraacetic acid (EDTA), ethylene glycol-bis-(3-amino-ethylether N,N, N′,N′-tetra-acetic acid (EGTA) citric acid, tartaric acid,acetic or other carboxylic acids or their salts. In other aspects, therelease reagent comprises phosphate. In other aspects, the releasereagent comprises a bisphosphonate. In certain aspects, the releasereagent is the enzyme ATP sulfurylase. In this particular aspect, theATP sulfurylase is in solution rather than being bound to a bead orother surface. The ATP sulfurylase can release the pyrophosphate fromthe sequestering agent by transforming the pyrophosphate into ATP in thepresence of adenysine phosphosulfate (APS). Typically, the ATP will havea lower binding affinity for the sequestering agent than doespyrophosphate.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,741,630, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods of detecting a target analyte in a biological sample, comprisingproviding a composite array comprising a substrate having a surface; afirst and a second assay location on said surface, wherein said assaylocations comprise a population of microspheres, and wherein saidmicrospheres comprise bioactive agents; a physical partition separatingsaid first assay location from said second assay location; adding saidbiological sample to said first assay location under conditionssufficient to allow said target analyte to bind to said bioactiveagents; and detecting the binding of said bioactive agents to saidtarget analyte. In more specific embodiments, the binding of thebioactive agents to the target analyte can be detected by a change in anoptical signature of the microspheres. Target analytes and bioactiveagents can include a nucleic acid, for example. In some embodiments, themethods can further comprise detecting a target analyte in a secondbiological sample by adding said second biological sample to said secondassay location and thereafter detecting the binding of said bioactiveagents to said target analyte. In certain aspects, the substrate caninclude a microscope slide. Further methods include detecting a targetnucleic acid in a biological sample, wherein said target nucleic acidincludes one or more single nucleotide polymorphisms (SNPs) at one ormore predetermined positions, comprising providing a composite arraycomprising a substrate having a surface; a population of microspheres,wherein said microspheres are linked to capture probes configured tobind to said target nucleic acid at said one or more predeterminedpositions; a first and second assay location on said surface, whereinsaid assay locations comprise said population of microspheres; aphysical partition separating said first assay location from said secondassay location; adding said biological sample to said first assaylocation under conditions sufficient to allow said target nucleic acidto bind to said capture probes; and detecting the binding of said targetnucleic acid to said capture probes. Additionally or alternatively,detection of a genetic biomarker can include array compositionscomprising a rigid support; a molded layer with at least a first assaylocation comprising discrete sites, where the molded layer is adhered tothe rigid support; a layer of bonding agent adhering the rigid supportto the molded layer; and a population of microspheres comprising atleast a first and a second subpopulation, where the first subpopulationcomprises a first bioactive agent and the second subpopulation comprisesa second bioactive agent where the microspheres are randomly distributedon the sites. Additionally or alternatively, detection of a geneticbiomarker can include a method for making an array compositioncontaining at least a first assay location having discrete sitescomprising the steps of contacting a surface of a template structure,the surface comprising one or more sets of projections, with a moldablematerial; removing the moldable material from the surface of thetemplate structure, whereby the removed moldable material forms a moldedlayer with at least a first assay location comprising discrete sites;adhering the molded layer to a rigid support; and randomly distributingmicrospheres on the molded layer such that individual discrete sitescomprise microspheres, where the microspheres comprise at least a firstand a second subpopulation, where the first subpopulation comprises afirst bioactive agent and the second subpopulation comprises a secondbioactive agent.

n some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,901,897, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includecomposite array compositions comprising a first substrate with a surfacecomprising a plurality of assay locations, each assay locationcomprising a plurality of discrete sites. The substrate furthercomprises a population of microspheres comprising at least a first and asecond subpopulation, wherein each subpopulation comprises a bioactiveagent. The microspheres are distributed on each of the assay locations.Additionally or alternatively, detection of a genetic biomarker caninclude composite array compositions comprising a first substrate with asurface comprising a plurality of assay locations and a second substratecomprising a plurality of array locations, each array locationcomprising discrete sites. The compositions further comprise apopulation of microspheres comprising at least a first and a secondsubpopulation, wherein each subpopulation comprises a bioactive agent.The microspheres are distributed on each of the array locations.Additionally or alternatively, detection of a genetic biomarker caninclude methods of decoding an array composition comprising providing anarray composition as outlined above, and adding a plurality of decodingbinding ligands to the composite array composition to identify thelocation of at least a plurality of the bioactive agents. Additionallyor alternatively, detection of a genetic biomarker can include methodsof determining the presence of one or more target analytes in one ormore samples comprising contacting the sample with the composition, anddetermining the presence or absence of said target analyte. Additionallyor alternatively, detection of a genetic biomarker can include ahybridization chamber. The hybridization chamber includes a base plateand a lid. A sealant is localized between the lid and base plate toprovide for an airtight seal. When a two-component array system is used,the chamber also includes component ports in the lid to immobilize thearray components. That is, array components are inserted through theport in the lid. The ports may include seals so that an airtight seal ismaintained. The chamber also may include clamps and alignment pins.Additionally or alternatively, detection of a genetic biomarker caninclude a hybridization chamber wherein the base plate contains holes.The holes may be in a microplate array format. In one embodiment, atleast two holes are joined by a channel. In one embodiment, a flexiblemembrane is placed on the base plate. When pressure i.e. a vacuum, isapplied to the membrane, wells form in the membrane at the location ofthe holes in the base plate. The apparatus also includes a pneumaticdevice for the delivery of a vacuum or positive pressure to themembrane. Additionally or alternatively, detection of a geneticbiomarker can include a method of mixing samples in an array formal. Themethod includes providing a vacuum to the membrane such that wells areformed. A solution is then applied to the membrane such that at leastone of the wells is filled with liquid. Subsequently, the vacuum isapplied intermittently to the membrane, which results in mixing of theliquid. Additionally or alternatively, detection of a genetic biomarkercan include an apparatus comprising a hybridization chamber and any ofthe composite array compositions. Additionally or alternatively,detection of a genetic biomarker can include performing methods ofdecoding an array composition in a hybridization chamber. Additionallyor alternatively, detection of a genetic biomarker can includeperforming methods of determining the presence of one or more targetanalytes in one or more samples in a hybridization chamber.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,288,103, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of detecting target sequences in a sample comprising providinga first solid support comprising at least a first and a second targetsequence, contacting the first and second target sequences with firstand second probes, respectively, wherein each of the first and secondprobes comprise a first universal priming site, a target specific domainsubstantially complementary to at least a portion of the targetsequence, to form first and second hybridization complexes,respectively, removing unhybridized probes, contacting the first andsecond hybridization complexes with a first enzyme to form modifiedfirst and second probes, respectively contacting the modified first andsecond probes with at least a first primer that hybridizes to theuniversal priming site NTPs, and an extension enzyme, wherein the firstand second modified probes are amplified to form first and secondamplicons, respectively, and detecting the amplicons. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetecting target sequences in a sample comprising providing a firstsolid support comprising at least a first and a second target sequence,contacting the first and second target sequences with first and secondprobes, respectively, wherein each of the first and second probescomprise a first universal priming site, a target specific domainsubstantially complementary to at least a portion of the targetsequence, to form first and second hybridization complexes,respectively, removing unhybridized probes, contacting the first andsecond probes with at least a first universal primer that hybridizes tothe universal priming site, NTPs and an extension enzyme, wherein thefirst and second probes are extended to form first and second modifiedprobes, respectively, contacting the first and second modified probeswith at least third and fourth probes, respectively, wherein themodified first and second probes comprise a detection position, thethird and fourth probes each comprise an interrogation position, and asecond enzyme, wherein the second enzyme only modifies the third andfourth probes if there is perfect complementarity between the bases atthe interrogation position and the detection position, forming third andfourth modified probes, and detecting the third and fourth modifiedprobes. Additionally or alternatively, detection of a genetic biomarkercan include a method comprising providing a plurality of target nucleicacid sequences each comprising from 3′ to 5′ a first, second and thirdtarget domain, the first target domain comprising a detection position,the second target domain being at least one nucleotide contacting thetarget nucleic acid sequences with sets of probes for each targetsequence, each set comprising a first probe comprising from 5′ to 3′ afirst domain comprising a first universal priming sequence, and a seconddomain comprising a sequence substantially complementary to the firsttarget domain of a target sequence, and an interrogation position withinthe 3′ four terminal bases, a second probe comprising a first domaincomprising a sequence substantially complementary to the third targetdomain of a target sequence, to form a set of first hybridizationcomplexes, contacting the first hybridization complexes with anextension enzyme and dNTPs, under conditions whereby if the base at theinterrogation positions is perfectly complementary with the bases at thedetection positions, extension of the first probes occurs through thesecond target domains to form second hybridization complexes, contactingthe second hybridization complexes with a ligase to ligate the extendedfirst probes to the second probes to form amplification templates.Additionally or alternatively, detection of a genetic biomarker caninclude a multiplex reaction method comprising providing a samplecomprising at least first and second targets hybridizing the first andsecond targets with first and second probes, respectively forming firstand second hybridization complexes, respectively, immobilizing the firstand second hybridization complexes, washing to remove unhybridizednucleic acids, contacting the first and second hybridization complexeswith an enzyme, whereby the first and second probes are modified formingmodified first and second probes, respectively, whereby the modifiedfirst and second probes are modified to contain first and secondinterrogation nucleotides that are complementary to first and seconddetection nucleotides in the first and second targets, respectively,contacting the modified first and second probes with first and secondallele specific primers, respectively, whereby the first and secondallele specific primers hybridize to the modified first and secondprobes, respectively, 5′ to the first and second interrogationnucleotides, dNTPs, polymerase, whereby the first and second allelespecific primers are modified when a target domain of the allelespecific primers is perfectly complementary to the modified targetprobes to form modified first and second allele specific probes,amplifying the modified first and second allele specific probes to formfirst and second amplicons, and detecting the first and secondamplicons. Additionally or alternatively, detection of a geneticbiomarker can include a method comprising providing a plurality oftarget nucleic acid sequences each comprising from 3′ to 5′ a first,second and third target domain, the first target domain comprising adetection position, the second target domain being at least onenucleotide, contacting the target nucleic acid sequences with sets ofprobes for each target sequence, each set comprising: a first probecomprising from 5′ to 3′, a first domain comprising a first universalpriming sequence, and a second domain comprising a sequencesubstantially complementary to the first target domain of a targetsequence, and an interrogation position within the 3′ four terminalbases, a second probe comprising a first domain comprising a sequencesubstantially complementary to the third target domain of a targetsequence, to form a set of first hybridization complexes, contacting thefirst hybridization complexes with at least a first universal primerthat hybridize to the first universal priming sequence, an extensionenzyme and dNTPs, under conditions whereby if the base at theinterrogation positions are perfectly complementary with the bases atthe detection positions, extension of the first probes occurs throughthe second target domains to form second hybridization complexes,contacting the second hybridization complexes with a ligase to ligatethe extended first probes to the second probes to form amplificationtemplates.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,899,626, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of measuring the methylation level of DNA. The method caninclude the steps of providing data representing the standard deviationof methylation measurements of DNA, determining the methylation level ofat least one locus in a sample DNA and comparing the methylation levelof the at least one locus to the data to determine the standarddeviation of the measurement. Additionally or alternatively, detectionof a genetic biomarker can include a method of comparing the methylationlevel of DNA samples. The method can include the steps of providing datarepresenting the standard deviation of methylation measurements of DNA;determining the methylation level of at least one locus in a firstsample DNA; determining the methylation level of the at least one locusin a second sample DNA; identifying the standard deviations of themethylation level of the at least one locus in the first sample DNA andin the second sample DNA from the data; and determining whether themethylation level of the at least one locus in the first sample DNA andin the second sample DNA are the same or different based on the standarddeviations. Additionally or alternatively, detection of a geneticbiomarker can include a DNA methylation level detection system,including a scanner for reading methylation levels for a plurality ofloci in a sample DNA and a first module configured to compare themethylation levels against data representing the standard deviation ofmethylation measurements of DNA. Some embodiments relate to a method ofmeasuring the methylation level of DNA including providing datarepresenting the standard deviation of methylation measurements of DNA;determining the methylation level of at least one locus in a sample DNA;and comparing the methylation level of said at least one locus to saiddata to determine the standard deviation of said measurement. In someembodiments, at least one locus comprises a plurality of loci. In someembodiments, the methylation levels are determined using an array. Insome embodiments, the plurality of loci comprises at least 100 locimeasured simultaneously on said array. In some embodiments, the datacorrelates standard deviation of methylation level as a function ofmethylation level. In some embodiments, the data comprises differentstandard deviation values for different methylation levels. In someembodiments, the data comprises said different standard deviation valuesoccurring along a parabola when correlated to said different methylationlevels. In some embodiments, the data is produced by creating a trainingset comprising mixtures of DNA with varying methylation levels, whereinsaid training set comprises replicates of said mixtures; determining themethylation level of at least one locus in said mixtures of saidtraining set; determining standard deviation values for said methylationlevels determined for said replicates of said training set; andcorrelating said standard deviation values and said methylation levelsdetermined for said training set. In some embodiments, the mixtures ofthe training set comprise different ratios of genomic DNA from a cellpopulation with highly methylated DNA and a cell population withminimally methylated DNA. In some embodiments, the methylation levelsfor the mixtures of said training set vary from 0 to 1. Some embodimentsfurther include identifying at least three regions from 0 to 1,determining the median of the methylation levels for each of the regionsand fitting a parabola to said median for each of said regions. In someembodiments, the standard deviation values comprise the 95th percentilestandard deviation values. Some embodiments further include the steps ofdetermining the methylation level of said at least one locus in a secondsample DNA; identifying the standard deviations of said methylationlevel of said at least one locus in said sample DNA and in said secondsample DNA from said data; and determining whether said methylationlevel of said at least one locus in said first sample DNA and in saidsecond sample DNA are the same or different based on said standarddeviations. Some embodiments relate to a DNA methylation level detectionsystem including a scanner for reading methylation levels for aplurality of loci in a sample DNA; and a first module configured tocompare said methylation levels against data representing the standarddeviation of methylation measurements of DNA. In some embodiments, themethylation levels are determined using an array. In some embodiments,the plurality of loci comprises at least 100 loci measuredsimultaneously on said array. In some embodiments, the data correlatesstandard deviation of methylation level as a function of methylationlevel. In some embodiments, the data comprises different standarddeviation values for different methylation levels. In some embodiments,the data comprises said different standard deviation values occurringalong a parabola when correlated to said different methylation levels.In some embodiments, the data is produced by creating a training setcomprising mixtures of DNA with varying methylation levels, wherein saidtraining set comprises replicates of said mixtures; determining themethylation level of at least one locus in said mixtures of saidtraining set; determining standard deviation values for said methylationlevels determined for said replicates of said training set; andcorrelating said standard deviation values and said methylation levelsdetermined for said training set. Some embodiments relate to a method ofcomparing the methylation level of DNA samples including providing datarepresenting the standard deviation of methylation measurements of DNA;determining the methylation level of at least one locus in a firstsample DNA; determining the methylation level of said at least one locusin a second sample DNA; identifying the standard deviations of saidmethylation level of said at least one locus in said first sample DNAand in said second sample DNA from said data; and determining whethersaid methylation level of said at least one locus in said first sampleDNA and in said second sample DNA are the same or different based onsaid standard deviations.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,776,531, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea probe composition, including (a) a substrate; (b) a probe moleculeattached to the substrate; and (c) a stabilization polymer layer on thesubstrate, wherein said stabilization polymer layer coats the probemolecule. Additionally or alternatively, detection of a geneticbiomarker can include a method of making a probe composition. The methodincludes the steps of (a) providing a substrate having an attachedbiopolymer probe; and (b) contacting the substrate with a stabilizationpolymer. Additionally or alternatively, detection of a genetic biomarkercan include a method of shipping a solid-phase probe. The methodincludes the steps of (a) providing a substrate having an attached probemolecule, and further having a stabilization polymer layer; (b) placingthe substrate in a package; and (c) shipping the package to a remotelocation.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,499,806, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includearray compositions comprising a substrate with a surface comprisingdiscrete sites, at least one fiducial, and a population of microspherescomprising at least a first and a second subpopulation. Eachsubpopulation comprises a bioactive agent, and the microspheres aredistributed on said surface. Each subpopulation may optionally comprisea unique optical signature, an identifier binding ligand that will binda decoder binding ligand such that the identification of the bioactiveagent can be elucidated, or both. Additionally or alternatively,detection of a genetic biomarker can include compositions comprising acomputer readable memory to direct a computer to function in a specifiedmanner. The computer readable memory comprises an acquisition module forreceiving a data image of a random array comprising a plurality ofdiscrete sites, a registration module for registering a data image, anda comparison module for comparing registered data images. Each modulecomprises computer code for carrying out its function. The registrationmodule may utilize any number of fiducials, including a fiducial fiberwhen the substrate comprises a fiber optic bundle, a fiducialmicrosphere, or a fiducial template generated from the random array.Additionally or alternatively, detection of a genetic biomarker caninclude methods of making the array compositions comprising forming asurface comprising individual sites on a substrate, distributingmicrospheres on the surface such that the individual sites containmicrospheres, and incorporating at least one fiducial onto the surface.When the array has complete rotational freedom, at least two fiducialsare preferred in the array to allow for correction of rotation.Additionally or alternatively, detection of a genetic biomarker caninclude methods for comparing separate data images of a random array.The methods comprise using a computer system to register a first dataimage of the random array to produce a registered first data image,using the computer system to register a second data image of the randomarray to produce a registered second data image, and comparing the firstand the second registered data images to determine any differencesbetween them. Additionally or alternatively, detection of a geneticbiomarker can include methods of decoding a random array compositioncomprising providing a random array composition. A first plurality ofdecoding binding ligands is added to the array composition and a firstdata image is created. A fiducial is used to generate a first registereddata image. A second plurality of decoding binding ligands is added tothe array composition and a second data image is created. The fiducialis used to generate a second registered data image. A computer system isused to compare the first and the second registered data image toidentify the location of at least two bioactive agents. Additionally oralternatively, detection of a genetic biomarker can include methods ofdetermining the presence of a target analyte in a sample. The methodscomprise acquiring a first data image of a random array composition, andregistering the first data image to create a registered first dataimage. The sample is then added to the random array and a second dataimage is acquired from the array. The second data image is registered tocreate a registered second data image. Then the first and the secondregistered data images are compared to determine the presence or absenceof the target analyte. Optionally, the data acquisition may be atdifferent wavelengths. Additionally or alternatively, detection of agenetic biomarker can include methods for preprocessing or prefilteringsignal data comprising acquiring a data image from an array, anddetermining the similarity of a first signal from at least one arraysite to a reference signal to determine whether the site comprises acandidate bead. Additionally or alternatively, detection of a geneticbiomarker can include methods for registering an analytical image of amicrosphere array comprising providing a hybridization intensity image.After the microsphere array is decoded, a registration grid is computedbased on known locations of bioactive agents on the microspheresobtained from the decoding step. The sample is added to the microspherearray and a hybridization intensity image is acquired from the array.Bright bead types are distributed throughout the array to serve asfiducials. The registration grid is overlaid on to the image and thenthe registration grid is aligned so that the identity of the signalintensity at each grid location for each bead type within the array isascertained. Once the correct position of the grid is obtained, eachcore is assigned a number so that the correct placement of the grid canbe made for further sequential images.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 6,942,968, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea composition that includes a substrate with a surface comprisingdiscrete sites, a reflective coating on the surface, and a population ofmicrospheres distributed on the substrate. The microspheres comprise atleast a first and a second subpopulation. Generally, at least onesubpopulation comprises a bioactive agent. Additionally oralternatively, detection of a genetic biomarker can include acomposition wherein the substrate comprises a first and a secondsurface, wherein the first surface comprises the discrete sites, and thereflective coating is on the second surface. The population ofmicrospheres are distributed on the first surface. Additionally oralternatively, detection of a genetic biomarker can include a method ofmaking a reflective array. The method includes providing a substratewith a surface comprising discrete sites, applying to the surface acoating of reflective material and distributing microspheres on thesurface. Additionally or alternatively, detection of a genetic biomarkercan include a method, wherein the substrate comprises a first and asecond surface, wherein the first surface comprises discrete sites, thereflective material is on the second surface and the microspheres aredistributed on the first surface. Additionally or alternatively,detection of a genetic biomarker can include a method comprisingproviding a preformed unitary fiber optic bundle comprising a proximaland a distal end, the distal end comprising plurality of discrete sitescomprising a population of microspheres, the population comprising atleast first and second subpopulations, and imaging the fiber opticbundle from the distal end. A reflective coating may be applied toeither the distal end or the proximal end of the fiber optic bundle.Additionally or alternatively, detection of a genetic biomarker caninclude an array composition comprising a substrate with a surfacecomprising discrete sites comprising alternatively shaped wells. Thewells may contain a cross section that is shaped as a square, a hexagon,a star, a triangle, a pentagon or an octagon. Additionally oralternatively, detection of a genetic biomarker can include methodcomprising providing a substrate with a plurality of discrete sites, thesites comprising alternatively shaped wells and a population ofmicrospheres, the population comprising at least first and secondsubpopulations, and imaging the substrate. Additionally oralternatively, detection of a genetic biomarker can include an arraycomposition comprising a substrate with a surface comprising discretesites and a population of microspheres distributed on the substrate,wherein the microspheres comprise a bioactive agent and a signaltransducer element. Additionally or alternatively, detection of agenetic biomarker can include a method of detecting a non-labeled targetanalyte in a sample comprising providing a substrate with a plurality ofdiscrete sites, distributing on the sites a population of microspherescomprising a bioactive agent and a signal transducer element, contactingthe substrate with the sample, whereby upon binding of the targetanalyte to the bioactive agent, a signal from the signal transducerelement is altered as an indication of the presence of the targetanalyte. Additionally or alternatively, detection of a genetic biomarkercan include a method of detecting a chiral molecule in a samplecomprising providing a substrate with a surface comprising at leastfirst and second discrete sites at least first and second bioactiveagents attached to the first and second discrete sites respectively,contacting the substrate with the sample, illuminating the substratewith polarized light, and detecting rotation of the light in at leastone of the first and second discrete sites as an indication of thepresence of the chiral molecule. Additionally or alternatively,detection of a genetic biomarker can include a method of determining thelocation of a microsphere in an array comprising providing a substratewith a first surface comprising at least a first and a second discretesite, wherein the first discrete site comprises a microsphere, but thesecond discrete site does not comprising a microsphere, illuminating thesubstrate and detecting illumination of the substrate, whereby reducedillumination at the first discrete site relative to the second discretesite provides an indication of the presence of the first microsphere inthe first discrete site. Additionally or alternatively, detection of agenetic biomarker can include a method of increasing signal output froman array comprising providing a substrate with a surface comprising atleast first and second discrete sites and at least first and secondlabels attached to the first and second discrete sites respectively,cooling the substrate to at least below room temperature and detecting asignal from the first and second labels, whereby the signal is increasedrelative to a signal obtained from a substrate that is not cooled.Additionally or alternatively, detection of a genetic biomarker caninclude a method for background signal subtraction in an arraycomprising providing a substrate with a surface comprising at leastfirst and second discrete sites and at least first and second labelsattached to the first and second discrete sites respectively, detectingthe signal from the first and second discrete sites in a plurality ofdifferent emissions, and subtracting the lowest signal from each of thefirst and second discrete sites from the remaining signals from thefirst and second discrete sites, respectively. Additionally oralternatively, detection of a genetic biomarker can include a method ofcorrecting image non-uniformity comprising providing a substrate with asurface comprising at least first and second discrete sites, at leastfirst and second labels attached to the first and second discrete sitesrespectively and at least a first internal reference point of knownsignal intensity, detecting a first and second signal from the first andsecond labels, respectively, detecting a signal from the internalreference point, and determining the variation between the signal fromthe internal reference point and the known signal intensity of theinternal reference point as an indication of said image non-uniformity.Additionally or alternatively, detection of a genetic biomarker caninclude a method of detecting a target analyte in a sample comprisingproviding an array comprising a substrate with a surface comprisingdiscrete sites, a reflective coating on said surface, and a populationof microspheres distributed on the substrate. The microspheres compriseat least a first and a second subpopulation each comprising a differentbioactive agent. The method further includes contacting the array withthe sample, such that the target analyte binds to at least one of thebioactive agents and detecting the presence of the target analyte. In apreferred embodiment, the target analyte is labeled. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetecting a target analyte in a sample comprising providing an arraycomprising a substrate with a surface comprising discrete sitescomprising alternatively shaped wells and a population of microspheresdistributed on the substrate. The microspheres comprise at least a firstand a second subpopulation each comprising a different bioactive agent.The method further includes contacting the array with the sample, suchthat the target analyte binds to at least one of the bioactive agentsand detecting the presence of the target analyte. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetecting a target analyte in a sample comprising providing a substratewith a surface comprising at least first and second discrete sites and apopulation of microspheres distributed on the substrate, wherein themicrospheres comprise at least a first and a second subpopulation eachcomprising a different bioactive agent, contacting the substrate withthe sample, such that the target analyte binds to at least one of thebioactive agents. In some embodiments, the method includes cooling thesubstrate to at least below room temperature and detecting a signal,whereby the signal is increased relative to a signal obtained from asubstrate that is not cooled.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 6,890,764, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includecompositions comprising a substrate with a surface comprising discretesites, and a population of microspheres distributed on the sites. Atleast one of the microspheres comprises a nanocrystal. The nanocrystalcan be embedded in the microsphere, for example using the sol-gelpolymerization process, or it can be attached to the microsphere. Themicrospheres optionally comprise bioactive agents and/or identifierbinding ligands. In an additional aspect, the population of microspherescomprises at least a first and a second subpopulation comprising a firstand a second bioactive agent, respectively, and a first and a secondoptical signature, respectively, capable of identifying each bioactiveagent. At least one of the optical signatures comprises a nanocrystal.Additionally or alternatively, detection of a genetic biomarker caninclude methods of making a composition comprising forming a surfacecomprising individual sites on a substrate and distributing microsphereson the surface such that the individual sites contain microspheres. Themicrospheres comprise an optical signature, and at least one opticalsignature comprises at least one nanocrystal. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetermining the presence of a target analyte in a sample comprisingcontacting the sample with a composition. The composition comprises asubstrate with a surface comprising discrete sites and a population ofmicrospheres comprising at least a first and a second subpopulation eachcomprising a bioactive agent and an optical signature capable ofidentifying the bioactive agent. The microspheres are distributed on thesurface such that the discrete sites contain microspheres and wherein atleast one of the optical signatures comprises at least one nanocrystal.The presence or absence of the target analyte is then determined.Additionally or alternatively, detection of a genetic biomarker caninclude methods of making a composition comprising adhering nanocrystalsto porous silica, and sealing the pores of the silica using the sol-gelpolymerization process.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0201992, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods, apparatus, systems, and computerprogram products for determining nucleic acid fragment sequences usingunique molecular indices (UMIs). In some implementations, the UMIsincludes nonrandom UMIs (NRUMIs) or variable-length, nonrandom uniquemolecular indices (vNRUMIs). Additionally or alternatively, detection ofa genetic biomarker can include methods for sequencing nucleic acidmolecules from a sample. The method includes: (a) applying adapters toDNA fragments in the sample to obtain DNA-adapter products, wherein eachadapter includes a nonrandom unique molecular index, and whereinnonrandom unique molecular indices of the adapters have at least twodifferent molecular lengths and form a set of variable-length, nonrandomunique molecular indices (vNRUMIs); (b) amplifying the DNA-adapterproducts to obtain a plurality of amplified polynucleotides; (c)sequencing the plurality of amplified polynucleotides, thereby obtaininga plurality of reads associated with the set of vNRUMIs; (d)identifying, among the plurality of reads, reads associated with a samevariable-length, nonrandom unique molecular index (vNRUMI); and (e)determining a sequence of a DNA fragment in the sample using the readsassociated with the same vNRUMI. In some implementations, identifyingthe reads associated with the same vNRUMI includes obtaining, for eachread of the plurality of reads, alignment scores with respect to the setof vNRUMIs, each alignment score indicating similarity between asubsequence of a read and a vNRUMI, wherein the subsequence is in aregion of the read in which nucleotides derived from the vNRUMI arelikely located. In some implementations, the alignment scores are basedon matches of nucleotides and edits of nucleotides between thesubsequence of the read and the vNRUMI. In some implementations, theedits of nucleotides include substitutions, additions, and deletions ofnucleotides. In some implementations, each alignment score penalizesmismatches at the beginning of a sequence but does not penalizemismatches at the end of the sequence. In some implementations,obtaining an alignment score between a read and a vNRUMI includes: (a)calculating an alignment score between the vNRUMI and each one of allpossible prefix sequences of the subsequence of the read; (b)calculating an alignment score between the subsequence of the read andeach one of all possible prefix sequences of the vNRUMI; and (c)obtaining a largest alignment score among the alignment scorescalculated in (a) and (b) as the alignment score between the read andthe vNRUMI. In some implementations, the subsequence has a length thatequals to a length of the longest vNRUMI in the set of vNRUMIs. In someimplementations, identifying the reads associated with the same vNRUMIin (d) further includes: selecting, for each read of the plurality ofreads, at least one vNRUMI from the set of vNRUMIs based on thealignment scores; and associating each read of the plurality of readswith the at least one vNRUMI selected for the read. In someimplementations, selecting the at least one vNRUMI from the set ofvNRUMIs includes selecting a vNRUMI having a highest alignment scoreamong the set of vNRUMIs. In some implementations, the at least onevNRUMI includes two or more vNRUMIs. In some implementations, the methodfurther includes selecting one of the two or more vNRUMI as the samevNRUMI of (d) and (e). In some implementations, the adapters applied in(a) are obtained by: (i) providing a set of oligonucleotide sequenceshaving at least two different molecular lengths; (ii) selecting a subsetof oligonucleotide sequences from the set of oligonucleotide sequences,all edit distances between oligonucleotide sequences of the subset ofoligonucleotide sequences meeting a threshold value, the subset ofoligonucleotide sequences forming the set of vNRUMIs; and (iii)synthesizing the adapters each including a double-stranded hybridizedregion, a single-stranded 5′ arm, a single-stranded 3′ arm, and at leastone vNRUMI of the set of vNRUMIs. In some implementations, the thresholdvalue is 3. In some implementations, the set of vNRUMIs include vNRUMIsof 6 nucleotides and vNRUMIs of 7 nucleotides. In some implementations,the determining of (e) includes collapsing reads associated with thesame vNRUMI into a group to obtain a consensus nucleotide sequence forthe sequence of the DNA fragment in the sample. In some implementations,the consensus nucleotide sequence is obtained based partly on qualityscores of the reads. In some implementations, the determining of (e)includes: identifying, among the reads associated with the same vNRUMI,reads having a same read position or similar read positions in areference sequence, and determining the sequence of the DNA fragmentusing reads that (i) are associated with the same vNRUMI and (ii) havethe same read position or similar read positions in the referencesequence. In some implementations, the set of vNRUMIs includes no morethan about 10,000 different vNRUMIs. In some implementations, the set ofvNRUMIs includes no more than about 1,000 different vNRUMIs. In someimplementations, the set of vNRUMIs includes no more than about 200different vNRUMIs. In some implementations, applying adapters to the DNAfragments in the sample includes applying adapters to both ends of theDNA fragments in the sample. Additionally or alternatively, detection ofa genetic biomarker can include methods for preparing sequencingadapters, the methods including: (a) providing a set of oligonucleotidesequences having at least two different molecular lengths; (b) selectinga subset of oligonucleotide sequences from the set of oligonucleotidesequences, all edit distances between oligonucleotide sequences of thesubset of oligonucleotide sequences meeting a threshold value, thesubset of oligonucleotide sequences forming a set of variable-length,nonrandom unique molecular indexes (vNRUMIs); and (c) synthesizing aplurality of sequencing adapters, wherein each sequencing adapterincludes a double-stranded hybridized region, a single-stranded 5′ arm,a single-stranded 3′ arm, and at least one vNRUMI of the set of vNRUMIs.In some implementations, (b) includes: (i) selecting an oligonucleotidesequence from the set of oligonucleotide sequences; (ii) adding theselected oligonucleotide to an expanding set of oligonucleotidesequences and removing the selected oligonucleotide from the set ofoligonucleotide sequences to obtain a reduced set of oligonucleotidesequences; (iii) selecting an instant oligonucleotide sequence from thereduced set that maximizes a distance function, wherein the distancefunction is a minimal edit distance between the instant oligonucleotidesequence and any oligonucleotide sequences in the expanding set, andwherein the distance function meeting the threshold value; (iv) addingthe instant oligonucleotide to the expanding set and removing theinstant oligonucleotide from the reduced set; (v) repeating (iii) and(iv) one or more times; and (vi) providing the expanding set as thesubset of oligonucleotide sequences forming the set of vNRUMIs.Additionally or alternatively, detection of a genetic biomarker caninclude a method for sequencing nucleic acid molecules from a sample,including (a) applying adapters to DNA fragments in the sample to obtainDNA-adapter products, wherein each adapter includes a nonrandom uniquemolecular index, and wherein nonrandom unique molecular indices of theadapters have at least two different molecular lengths and form a set ofvariable-length, nonrandom unique molecular indices (vNRUMIs); (b)amplifying the DNA-adapter products to obtain a plurality of amplifiedpolynucleotides; (c) sequencing the plurality of amplifiedpolynucleotides, thereby obtaining a plurality of reads associated withthe set of vNRUMIs; and (d) identifying, among the plurality of reads,reads associated with a same variable-length, nonrandom unique molecularindex (vNRUMI). Additionally or alternatively, detection of a geneticbiomarker can include a method for sequencing nucleic acid moleculesfrom a sample, including (a) applying adapters to DNA fragments in thesample to obtain DNA-adapter products, wherein each adapter includes aunique molecular index (UMI), and wherein unique molecular indices(UMIs) of the adapters have at least two different molecular lengths andform a set of variable-length unique molecular indices (vUMIs); (b)amplifying the DNA-adapter products to obtain a plurality of amplifiedpolynucleotides; (c) sequencing the plurality of amplifiedpolynucleotides, thereby obtaining a plurality of reads associated withthe set of vUMIs; and (d) identifying, among the plurality of reads,reads associated with a same variable-length unique molecular index(vUMI). Additionally or alternatively, detection of a genetic biomarkercan include a method for sequencing nucleic acid molecules from asample, including (a) applying adapters to DNA fragments in the sampleto obtain DNA-adapter products, wherein each adapter includes a uniquemolecular index (UMI) in a set of unique molecular indices (UMIs); (b)amplifying the DNA-adapter products to obtain a plurality of amplifiedpolynucleotides; (c) sequencing the plurality of amplifiedpolynucleotides, thereby obtaining a plurality of reads associated withthe set of UMIs; (d) obtaining, for each read of the plurality of reads,alignment scores with respect to the set of UMIs, each alignment scoreindicating similarity between a subsequence of a read and a UMI; (e)identifying, among the plurality of reads, reads associated with a sameUMI using the alignment scores; and (e) determining a sequence of a DNAfragment in the sample using the reads associated with the same UMI.Additionally or alternatively, detection of a genetic biomarker caninclude a system, apparatus, and computer program products fordetermining DNA fragment sequences implementing the methods.Additionally or alternatively, detection of a genetic biomarker caninclude a computer program product including a non-transitory machinereadable medium storing program code that, when executed by one or moreprocessors of a computer system, causes the computer system to implementa method for determining sequence information of a sequence of interestin a sample using unique molecular indices (UMIs). The program codeincludes instructions to perform the methods above.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0201974, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods and compositions for targetedamplification of DNA and sample identification. Additionally oralternatively, detection of a genetic biomarker can include methods forobtaining nucleic acid sequence information from a biological samplecomprising: (a) providing a biological sample comprising differenttarget nucleic acids; (b) contacting the biological sample with aplurality of different probe sets to form hybridization complexes withthe different target nucleic acids; (c) amplifying the nucleic acid fromthe biological samples to produce amplicons; wherein there is nopurification of the nucleic acid from the biological sample prior to thecontacting step (b); and (d) obtaining nucleic acid sequence informationfor a plurality of portions of the amplified sample. Additionally oralternatively, detection of a genetic biomarker can include a method ofobtaining nucleic acid sequence information from a FFPE samplecomprising: (a) providing a FFPE sample comprising different targetnucleic acids embedded within a preserved tissue; (b) contacting theFFPE sample with a plurality of different probe sets to formhybridization complexes with the different target nucleic acids; (c)amplifying the nucleic acid from the FFPE samples to produce amplicons;wherein there is no purification of the nucleic acid from the FFPEsample prior to the contacting step (b); and (d) obtaining nucleic acidsequence information for a plurality of the amplicons. In particularembodiments, there is no purification of the nucleic acid from the FFPEsample prior to the amplifying in step (c). Additionally oralternatively, detection of a genetic biomarker can include methods foramplification of nucleic acid from a FFPE sample comprising: (a)providing a FFPE sample comprising nucleic acid embedded within apreserved tissue, the nucleic acid having, from 3′ to 5′: contiguousfirst, second, and third target domains; (b) contacting the FFPE samplewith a plurality of different probe sets to form hybridization complexeswith the different target nucleic acids, wherein each probe setcomprises: (i) a first probe comprising, from 5′ to 3′: a first primingsequence and a sequence that is substantially complementary to the firsttarget domain; and (ii) a second probe comprising 5′ to 3′: a sequencesubstantially complementary to the third target domain, and a secondpriming sequence; (c) contacting the hybridization complexes with anextension enzyme and nucleotides, wherein the first probes are extendedalong the second target domains of hybridization complexes formed in(b); (d) ligating the extended first probes to the second probes to formamplification templates; and (e) amplifying the amplification templateswith first and second primers that are complementary to the firstpriming sequence and the second priming sequence to produce ampliconsand obtaining nucleic acid sequence information for a plurality of theamplicons. Additionally or alternatively, detection of a geneticbiomarker can include a method for nucleic acid sample identificationcomprising: (a) providing a nucleic acid-containing cellular sample; (b)lysing cells of the sample with a lysis reagent to liberate nucleic acidfrom within the cells of the cellular sample, thereby forming a lysate;(c) amplifying the nucleic acid from the lysed samples; wherein there isno purification of the nucleic acid from the lysate prior to beginningthe amplification step (c); and (d) obtaining nucleic acid sequenceinformation for a plurality of portions of the amplified sample, andcomparing the sequence information to a second set of sequenceinformation. In certain aspects, the nucleic acid is DNA. In certainaspects, the sample is a blood sample. In certain aspects, the samplecomprises dried blood. In certain aspects, the sample comprises a FFPEtissue sample. In certain aspects, the second set of sequenceinformation comprises a whole genome sequence. In certain aspects, thesecond set of sequence information comprises exome sequence information.In certain aspects, the amplifying comprises a targeted amplificationreaction. In certain aspects, the targeted amplification reactioncomprises extension and ligation of two probes. In certain aspects, thetargeted amplification reaction comprises polymerase chain reactionusing at least two amplification primers that are specific for a portionof the sample genome. Additionally or alternatively, detection of agenetic biomarker can include a method of tracking the identity of abiological sample during different stages of sample processing,comprising: (a) providing a nucleic acid-containing cellular sample; (b)separating a portion of the sample into a first portion and a secondportion and obtaining a first set of nucleic acid sequence informationfrom the first portion the biological sample according to the aboveembodiments, wherein the first set of nucleic acid sequence informationcomprises identity informative sequence information; (c) purifyingnucleic acid from the second portion and obtaining a second set ofsequence information; and (d) using computer-assisted logic, comparingthe identity informative sequence information from the first set ofnucleic acid sequence information sequence information to the second setof sequence information to confirm that the first and second sets ofsequence information were obtained from the same source. Additionally oralternatively, detection of a genetic biomarker can include a method ofconfirming the source of two different biological samples comprising:(a) providing a first nucleic acid-containing cellular sample; (b)obtaining a first set of nucleic acid sequence information from thefirst portion the biological sample according to some of the aboveembodiments, wherein the first set of nucleic acid sequence informationcomprises identity informative sequence information; (c) providing asecond nucleic acid sample comprising purified nucleic acid andobtaining a second set of sequence information; and (d) usingcomputer-assisted logic, comparing the identity informative sequenceinformation from the first set of nucleic acid sequence informationsequence information to the second set of sequence information toconfirm that the first and second sets of sequence information wereobtained from the same individual.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0155774, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include compositions, systems, and methods forsequencing polynucleotides using tethers anchored to polymerasesadjacent to nanopores. Under one aspect, a composition includes ananopore including a first side, a second side, and an apertureextending through the first and second sides. The composition also caninclude a plurality of nucleotides, wherein each of the nucleotidesincludes an elongated tag. The composition also can include first andsecond polynucleotides, the first polynucleotide being complementary tothe second polynucleotide. The composition also can include a polymerasedisposed adjacent to the first side of the nanopore, the polymeraseconfigured to add nucleotides of the plurality of nucleotides to thefirst polynucleotide based on a sequence of the second polynucleotide.The composition also can include a permanent tether including a headregion, a tail region, and an elongated body disposed therebetween, thehead region being anchored to the polymerase, wherein the elongated bodyoccurs in the aperture of the nanopore. The composition also can includea first moiety disposed on the elongated body, wherein the first moietyis configured to bind to the elongated tag of a first nucleotide uponwhich the polymerase is acting, as well as a reporter region disposed onthe elongated body, wherein the reporter region is configured toindicate when the first nucleotide is complementary or is notcomplementary to a next nucleotide in the sequence of the secondpolynucleotide. Under another aspect, a method can include providing ananopore including a first side, a second side, and an apertureextending through the first and second sides. The method further caninclude providing a plurality of nucleotides, wherein each of thenucleotides includes an elongated tag. The method further can includeproviding first and second polynucleotides, the first polynucleotidebeing complementary to the second polynucleotide. The method further caninclude providing a polymerase disposed adjacent to the first side ofthe nanopore, the polymerase configured to add nucleotides of theplurality of nucleotides to the first polynucleotide based on a sequenceof the second polynucleotide, wherein the polymerase is anchored to apermanent tether including a head region, a tail region, and anelongated body disposed therebetween, the elongated body occurring inthe aperture of the nanopore. The method further can include determiningthat a first nucleotide is being acted upon by the polymerase based onbinding of the elongated tag to a first moiety disposed on the elongatedbody. The method further can include, with a reporter region disposed onthe elongated body, indicating when the first nucleotide iscomplementary or is not complementary to a next nucleotide in thesequence of the second polynucleotide.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0141020, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include substrates and methods useful for placinga single molecule onto a target area. In a first aspect is a substratethat includes a plurality of first and second capture primersimmobilized to a feature on the substrate. At least one targetpolynucleotide, one end attached to one of the capture primers and theother end linked to a target molecule, wherein the target polynucleotideincludes a target region flanked by first and second capture primerbinding regions complementary to the first and second capture primers,the second capture primer binding region includes a base pair mismatchto the second capture primer, and a plurality of clonal ampliconscomplementary to the target polynucleotide immobilized to the feature.In some embodiments, the base pair mismatch is a 1, 2, 3, 4, 5, 6, 7, 8,9, or 10 base pair mismatch. In some embodiments, the base pair mismatchis a three base pair mismatch. In some embodiments, the substratefurther includes a plurality of features. In some embodiments, thefeature includes a single target molecule. In some embodiments, thefeature is filled to capacity with the plurality of clonal amplicons. Insome embodiments, the plurality of features includes a single targetmolecule. In some embodiments, two or more of the features includedifferent single target molecules. In some embodiments, the features arefilled to capacity with the plurality of clonal amplicons. Additionallyor alternatively, detection of a genetic biomarker can include methodsof placing a single target molecule on a feature of a substrate. In oneaspect, the method is a method of placing a single target molecule on afeature of a substrate by hybridizing a plurality of first and secondcapture primers immobilized to a feature on a substrate with at leastone target polynucleotide, where the target polynucleotide includes atarget region flanked by first and second capture primer binding regionscomplementary to the first and second capture primers, and the secondcapture primer binding region includes a base pair mismatch to thesecond capture primer and being linked to a target molecule. The methodfurther includes amplifying the at least one target polynucleotide at anaverage amplification rate that exceeds an average transport rate of atarget polynucleotide to a feature to produce a plurality of clonalamplicons complementary to the target polynucleotide. In someembodiments of the methods, the base pair mismatch is a 1, 2, 3, 4, 5,6, 7, 8, 9, or 10 base pair mismatch. In some embodiments of themethods, the base pair mismatch is a three base pair mismatch. In someembodiments of the methods, the substrate comprises a plurality offeatures. In some embodiments of the methods, the feature includes asingle target molecule. In some embodiments of the methods, the featureis filled to capacity with the plurality of clonal amplicons. In someembodiments of the methods, the plurality of features includes a singletarget molecule. In some embodiments of the methods, the two or more ofthe features include different single target molecules. In someembodiments of the methods, the features are filled to capacity with theplurality of clonal amplicons. In some embodiments of the methods, theaverage amplification rate of subsequent amplicons produced at thefeature exceeds the average amplification rate of a first amplicon. Insome embodiments, the target polynucleotide includes one or morepolynucleotides selected from the group consisting of RNA, DNA, and PNA.In some embodiments, the target polynucleotides include double strandedDNA (dsDNA). In some embodiments, the target polynucleotide comprisesless than 1,000 nucleotides. In some embodiments, the targetpolynucleotide comprises between 10 to 25, 26 to 50, 51 to 100, 101 to200, 201 to 300, 301 to 400, 401 to 500, 501 to 600, 601 to 700, 701 to800, 801 to 900, or 901 to 1000 base pairs in length. In someembodiments, the target molecule includes a polypeptide, polynucleotide,carbohydrate, amino acid, nucleotide, monosaccharide, hapten, ligand,antigen, analyte, small molecule organic compound or inorganic compound.In some embodiments, the target molecule includes a polypeptide. In someembodiments, the polypeptide is selected from the group consisting of ananopore, binding polypeptide and enzyme. In some embodiments, thenanopore pore is selected from the group consisting of MspA, OmpF, OmpG,NalP, WZA, ClyA toxin, α-hemolysin, anthrax toxin, leukocidins and DNAorigami nanopore. In some embodiments, the binding polypeptide isselected from the group consisting of an antibody, a Fab, a Fab′, aF(ab′)2, a scFV, a diabody, a triabody, a minibody and a single-domainantibody (sdAB), T cell receptor, microcins, Neuropeptides, G-proteincoupled receptors, antibody, epidermal growth factor receptor and HER2.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0095969, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods, systems and apparatus forcapturing, integrating, organizing, navigating and querying large-scaledata from high-throughput biological and chemical assay platforms. Someembodiments provide methods, systems and interfaces for associatingexperimental data, features and groups of data related by structureand/or function with chemical, medical and/or biological terms in anontology or taxonomy. Some embodiments also provide methods, systems andinterfaces for filtering data by data source information, allowingdynamic navigation through large amounts of data to find the mostrelevant results for a particular query. A system of one or morecomputers can be configured to perform particular operations or actionsby virtue of having software, firmware, hardware, or a combination ofthem installed on the system that in operation causes or cause thesystem to perform the actions. One or more computer programs can beconfigured to perform particular operations or actions by virtue ofincluding instructions that, when executed by data processing apparatus,cause the apparatus to perform the operations including: (a) selecting,by the one or more processors, a plurality of gene sets from a database,wherein each gene set of the plurality of gene sets includes a pluralityof genes and a plurality of experimental values associated with theplurality of genes, and wherein the plurality of experimental values arecorrelated with the biological, chemical or medical concept of interestin at least one experiment; (b) determining, for each gene set and bythe one or more processors, one or more experimental gene scores forfirst one or more genes among the plurality of genes using one or moreexperimental values of the first one or more genes; (c) determining, foreach gene set and by the one or more processors, one or more in silicogene scores for second one or more genes among the plurality of genesbased at least in part on the first one or more genes' correlations withthe second one or more genes, wherein the first one or more genes'correlations with the second one or more genes are indicated in othergene sets in the database beside the plurality of gene sets; (d)obtaining, by the one or more processors, summary scores for the firstand second one or more genes based at least in part on the one or moreexperimental gene scores for the first one or more genes determined in(b) and the one or more in silico gene scores for the second one or moregenes determined in (c), wherein each summary score is aggregated acrossthe plurality of gene sets; and (e) identifying, by the one or moreprocessors, the genes that are potentially associated with thebiological, chemical or medical concept of interest using the summaryscores of the first and second one or more genes. Implementations mayinclude one or more of the following features. In some implementations,(c) includes, for each gene set of the plurality of gene sets: (i)identifying a second plurality of gene sets from the database, each geneset of the second plurality of gene sets including a second plurality ofgenes and a second plurality of experimental values associated with thesecond plurality of genes, and where the second plurality ofexperimental values are correlated with a first gene among the first oneor more genes. The method may also include (ii) aggregating theexperimental values across the second plurality of gene sets to obtain avector of aggregated values for the first gene among the first one ormore genes. The method may also include (iii) applying (i) and (ii) toone or more other genes among the first one or more genes, therebyobtaining one or more vectors of experimental values for the one or moreother genes among the first one or more genes. The method may alsoinclude (iv) aggregating vectors of aggregated values for the first geneand the one or more other genes among the first one or more genes,thereby obtaining one compressed vector including the one or more insilico gene scores for the second one or more genes. Additionally oralternatively, detection of a genetic biomarker can include a methodwhere each of the aggregated vectors of (iv) for a particular gene amongthe first one or more genes is weighted in proportion to an experimentalvalue of the particular gene. The method where each of the aggregatedvectors of (iv) for a particular gene among the first one or more genesis weighted in proportion to a number of gene sets of the secondplurality of gene sets identified for the particular gene. Someimplementations provide the method further including, determining,before (d), one or more gene-group scores for third one or more genes.Some implementations provide the method where each gene-group score fora particular gene is determined using (i) gene memberships of one ormore gene groups that each include a group of genes related to a grouplabel, where the group of genes includes the particular gene, and (ii)at least some of the one or more experimental values of the first one ormore genes. Some implementations provide the method where (d) includesobtaining the summary scores for the first and second one or more genesbased at least in part on the gene-group scores for at least some of thethird one or more genes, as well as the one or more experimental scoresfor the first one or more genes determined in (b) and the one or more insilico scores for the second one or more genes determined in (c). Someimplementations provide the method where determining the one or moregene-group scores for the third one or more genes includes: identifying,for a particular gene among the third one or more genes, the one or moregene groups that each include the particular gene. The method may alsoinclude determining, for each gene group, a percentage of members of thegene group that are among the first one or more genes. The method mayalso include aggregating, for each gene group, one or more experimentalvalues of at least some of the first one or more genes that are membersof the gene group, thereby obtaining a sum experimental value for thegene group. The method may also include determining, for the particulargene among the third one or more genes, a gene-group score using thepercentage of members of the gene group that are among the first one ormore genes and the sum experimental value for the gene group. Someimplementations provide the method where determining the gene-groupscore using the percentage of members of the gene group that are amongthe first one or more genes and the sum experimental value for the genegroup includes: obtaining, for each gene group, a product of thepercentage of members and the sum experimental value, thereby obtainingone or more products for the one or more gene groups. The method mayalso include summing, across the one or more gene groups, the one ormore products, thereby obtaining a summed product. The method may alsoinclude determining, for the particular gene among the third one or moregenes, a gene-group score based on the summed product. In someimplementations, the method further includes, before (d), determininginteractome scores respectively for fourth one or more genes. In someimplementations, each interactome score for a particular gene isdetermined using (i) connections between the particular gene and othergenes connected to the particular gene in a network of genes and (ii) atleast some of the one or more experimental values of the first one ormore genes. In some implementations, (d) includes obtaining the summaryscores for at least the first one or more genes and the second one ormore genes based at least in part on the interactome scores for at leastsome of the fourth one or more genes, as well as the one or moreexperimental gene scores for the first one or more genes determined in(b) and the one or more in silico gene scores for the second one or moregenes determined in (c). In some implementations, the network of genesare based on interactions and relations among genes, proteins, and/orphospholipids. In some implementations, calculating the interactomescore includes calculating the interactome score as Ni′:

Ni′=Ni+Σ(Ni+Nn)*edge_weightn)

wherein Ni is the summary score of the particular gene i, Nn is asummary score of gene n connected to the particular gene, andedge_weightn is the weight of the edge connecting the particular gene iand gene n. In some implementations, calculating the interactome scorefurther includes: saving Ni′ that are smaller than a second threshold ina first pass dictionary; and repeating the calculation for all genes inthe first pass dictionary, thereby updating the interactome scores. Insome implementations, the method further includes training the model byoptimizing an objective function. In some implementations, training themodel includes applying a bootstrap technique to bootstrap samples. Insome implementations, the objective function relates to at least onesummary score distribution after bootstrapping. In some implementations,optimizing the objective function includes minimizing differences ofsummary scores between a training set and a validation set. In someimplementations, optimizing the objective function includes maximizing adistance between a summary score distribution obtained from theplurality of gene sets and a summary score distribution obtained fromrandom gene sets. In some implementations, summary scores are ranked andbinned in buckets of a defined size, wherein penalty scores are assignedto the buckets, the penalty scores favoring higher ranked summaryscores. In some implementations, the objective function is based only ontop ranked summary scores. In some implementations, training the modelincludes using the objective function in an unsupervised machinelearning approach to learn parameters of the model. In someimplementations, the model has the form:

F(0)=k1*c1+k2*c2+ . . . +kn*cn

wherein θ are parameters of the model, ci are components of the model,and ki are weight factors for the components. In some implementations,the method further includes partitioning one or more of the componentsof the model into sub-components based on sample weights of experimentaldata types. In some implementations, the summary scores of the first andsecond one or more genes are penalized based on how likely experimentalvalues of the first and second one or more genes in one or more randomgene sets are correlated with the biological, chemical or medicalconcept of interest. In some implementations, each summary score of aparticular gene is penalized by a penalty value that is inverselyproportional to a p value of a rank product, wherein the rank productincludes a product of ranks of the particular gene across the one ormore random gene sets. One general aspect includes a computer programproduct including a non-transitory machine readable medium storingprogram code that, when executed by one or more processors of a computersystem, causes the computer system to implement a method for identifyinggenes that are potentially associated with a biological, chemical ormedical concept of interest, said program code including: (a) code forselecting a plurality of gene sets from a database, where each gene setof the plurality of gene sets includes a plurality of genes and aplurality of experimental values associated with the plurality of genes,and where the plurality of experimental values are correlated with thebiological, chemical or medical concept of interest in at least oneexperiment. The program code also includes (b) code for determining, foreach gene set, one or more experimental gene scores for first one ormore genes among the plurality of genes using one or more experimentalvalues of the first one or more genes. The program code also includes(c) code for determining, for each gene set, one or more in silico genescores for second one or more genes among the plurality of genes basedat least in part on the first one or more genes' correlations with thesecond one or more genes, where the first one or more genes'correlations with the second one or more genes are indicated in othergene sets in the database beside the plurality of gene sets. The programcode also includes (d) code for obtaining summary scores for the firstand second one or more genes based at least in part on the one or moreexperimental gene scores for the first one or more genes determined in(b) and the one or more in silico gene scores for the second one or moregenes determined in (c), where each summary score is aggregated acrossthe plurality of gene sets. The program code also includes (e) code foridentifying the genes that are potentially associated with thebiological, chemical or medical concept of interest using the summaryscores of the first and second one or more genes.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0023119, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods for preparing a sequencing librarythat includes nucleic acids from a plurality of single cells. In oneembodiment, the method includes providing isolated nuclei from aplurality of cells; subjecting the isolated nuclei to a chemicaltreatment to generate nucleosome-depleted nuclei while maintainingintegrity of the isolated nuclei; distributing subsets of thenucleosome-depleted nuclei into a first plurality of compartments andcontacting each subset with a transposome complex, where the transposomecomplex in each compartment includes a transposase and a first indexsequence that is different from first index sequences in the othercompartments; fragmenting nucleic acids in the subsets ofnucleosome-depleted nuclei into a plurality of nucleic acid fragmentsand incorporating the first index sequences into at least one strand ofthe nucleic acid fragments to generate indexed nuclei that includeindexed nucleic acid fragments, where the indexed nucleic acid fragmentsremain attached to the transposases; combining the indexed nuclei togenerate pooled indexed nuclei; distributing subsets of the pooledindexed nuclei into a second plurality of compartments; incorporatinginto the indexed nucleic acid fragments in each compartment a secondindex sequence to generate dual-index fragments, where the second indexsequence in each compartment is different from second index sequences inthe other compartments; and combining the dual-index fragments, therebyproducing a sequencing library that includes whole genome nucleic acidsfrom the plurality of single cells. In one embodiment, the chemicaltreatment includes a treatment with a chaotropic agent capable ofdisrupting nucleic acid-protein interactions, such as lithium3,5-diiodosalicylic acid. In one embodiment, the chemical treatmentincludes a treatment with a detergent capable of disrupting nucleicacid-protein interactions, such as sodium dodecyl sulfate (SDS). In oneembodiment, the nuclei are treated with a cross-linking agent beforesubjecting the isolated nuclei to the chemical treatment, such asformaldehyde. The cross-linking agent can be at a concentration fromabout 0.2% to about 2%, and in one embodiment is about 1.5%. In oneembodiment, the cross-linking by formaldehyde is reversed afterdistributing subsets of the pooled indexed nuclei and beforeincorporating into the indexed nucleic acid fragments in eachcompartment a second index sequence. In one embodiment, the reversal ofthe cross-linking includes incubation at about 55° C. to about 72° C. Inone embodiment, the transposases are disassociated from the indexednucleic acid fragments prior to the reversal of the cross-linking. Inone embodiment, the transposases are disassociated from the indexednucleic acid fragments using sodium dodecyl sulfate (SDS). In oneembodiment, the nuclei are treated with a restriction enzyme prior tofragmenting nucleic acids in the subsets of nucleosome-depleted nucleiinto a plurality of nucleic acid fragments and incorporating the firstindex sequences. In one embodiment, the nuclei are treated with a ligaseafter treatment with the restriction enzyme. In one embodiment, thedistributing subsets of the nucleosome-depleted nuclei, the distributingsubsets of the pooled indexed nuclei, or the combination thereof, isperformed by fluorescence-activated nuclei sorting. In one embodiment,the subsets of the nucleosome-depleted nuclei include approximatelyequal numbers of nuclei, and in one embodiment, the subsets of thenucleosome-depleted nuclei include from 1 to about 2000 nuclei. In oneembodiment, the subsets of the pooled indexed nuclei includeapproximately equal numbers of nuclei, and in one embodiment, thesubsets of the pooled indexed nuclei include from 1 to about 25 nuclei.In one embodiment, the subsets of the pooled indexed nuclei include atleast 10 times fewer nuclei than the subsets of the nucleosome-depletednuclei, or at least 100 times fewer nuclei than the subsets of thenucleosome-depleted nuclei. In one embodiment, the first plurality ofcompartments, the second plurality of compartments, or the combinationthereof, is a multiwell plate, such as a 96-well plate or a 384-wellplate. In one embodiment, the transposome complex is added to thecompartments after the subsets of nucleosome-depleted nuclei aredistributed into the compartments. In one embodiment, each of thetransposome complexes includes a transposon, and each of the transposonsincludes a transferred strand. In one embodiment, the transferred strandincludes the first index sequence and a first universal sequence. In oneembodiment, the incorporation of the second index sequence into theindexed nucleic acid fragments includes contacting the indexed nucleicacid fragments in each compartment with a first universal primer and asecond universal primer, each including an index sequence and eachincluding a sequence identical to or complementary to a portion of thefirst universal sequence, and performing an exponential amplificationreaction. In one embodiment, the exponential amplification reaction canbe a polymerase chain reaction (PCR), and in one embodiment, the PCR caninclude 15 to 30 cycles.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0037950, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include microarrays and methods of modifyingimmobilized capture primers. In one aspect, is a microarray including:a) a substrate including at least one well, a surface surrounding thewell and an inner well surface; b) a first layer covering the inner wellsurface and including at least one first capture primer pair; and c) asecond layer covering the first layer and the surface surrounding thewell. In another aspect, is a microarray including: a) a substrateincluding at least one well, a surface surrounding the well and an innerwell surface; and b) a layer covering the inner well surface andincluding at least one first capture primer pair and at least one secondcapture primer pair. In another aspect, is a method for amplifying anucleic acid, including: a) producing a first layer on a substrate,wherein the substrate includes at least one well, a surface surroundingthe well and an inner well surface, wherein the first layer covers theinner well surface; b) depositing at least one first capture primer pairin the first layer; c) producing a second layer on the substratecovering the first layer and the surface surrounding the well; d)contacting a sample including a plurality of target polynucleotides withthe substrate under conditions sufficient for a target polynucleotide tohybridize with a capture primer of the at least one first capture primerpair, and e) performing a first kinetic exclusion assay (KEA) to producea clonal population of amplicons from the target polynucleotide insidethe well, thereby amplifying the target polynucleotide. Additionally oralternatively, detection of a genetic biomarker can include a method foramplifying a nucleic acid, including: a) producing a first layer on asubstrate, wherein the substrate includes at least one well, a surfacesurrounding the well and an inner well surface, wherein the first layerat least partially covers the inner well surface; b) depositing at leastone first capture primer pair in the first layer, wherein the firstcapture primer pair includes a plurality of first capture primersincluding a 3′ portion including an Illumina® P5 primer nucleotidesequence and a plurality of second capture primers including a 3′portion including an Illumina® P7 primer nucleotide sequence; c)producing a second layer on the substrate covering the first layer andthe surface surrounding the well; d) depositing at least one secondcapture primer pair in the second layer, wherein the second captureprimer pair is 3′ phosphate-terminated and includes a plurality of firstcapture primers including a 3′ portion including an Illumina® P5 primernucleotide sequence and a plurality of second capture primers includinga 3′ portion including an Illumina® P7 primer nucleotide sequence; e)contacting a sample including a plurality of target polynucleotides withthe substrate under conditions sufficient for a single targetpolynucleotide per well to hybridize with a primer of the at least onefirst capture primer pair, wherein the target polynucleotides areflanked by complementary universal primer regions each including acomplementary Illumina® P5′ primer nucleotide sequence or acomplementary Illumina® P7′ primer nucleotide sequence; f) performing afirst KEA to produce a monoclonal population of amplicons from thesingle target polynucleotide inside the at least one well, therebyamplifying the target polynucleotide; g) contacting the substrate with aT4-kinase to deblock the primers of the second primer pair, and h)performing bridge amplification or a second KEA to enlarge themonoclonal population of amplicons of the single target polynucleotidebeyond the well. Additionally or alternatively, detection of a geneticbiomarker can include a method for amplifying a nucleic acid, including:a) producing a first layer on a substrate, wherein the substrateincludes at least one well, a surface surrounding the well and an innerwell surface, wherein the first layer at least partially covers theinner well surface; b) depositing at least one first capture primer pairin the first layer, wherein the first capture primer pair includes aplurality of at least one first capture primers including a 3′ portionincluding an Illumina® P5 primer nucleotide sequence and an Illumina®SBS3 primer nucleotide sequence and a plurality of at least one secondcapture primers including a 3′ portion including an Illumina® P7 primernucleotide sequence and an Illumina® SBS8 primer nucleotide sequence; c)producing a second layer on the substrate covering the first layer andthe surface surrounding the well; d) depositing at least one secondcapture primer pair in the second layer, wherein the at least one secondcapture primer pair includes a plurality of first capture primersincluding a 3′ portion including an Illumina® P5 primer nucleotidesequence and a plurality of second capture primers including an 3′portion including an Illumina® P7 nucleotide sequence; e) contacting asample including a plurality of target polynucleotides with thesubstrate under conditions sufficient for a single target polynucleotideper well to hybridize with a primer of the at least one first captureprimer pair, wherein the plurality of target polynucleotides are flankedby a complementary SBS each including a complementary Illumina® SBS3′primer nucleotide sequence or a complementary Illumina® SBS8′ nucleotidesequence, and f) performing a KEA for an extended time to produce amonoclonal population of amplicons from the single target polynucleotideinside and outside the at least one well, thereby amplifying the singletarget polynucleotide inside the well and enlarging the monoclonalpopulation of target polynucleotides beyond the at least one well.Additionally or alternatively, detection of a genetic biomarker caninclude a method for amplifying a nucleic acid, including: a) producinga first layer on a substrate, wherein the substrate includes at leastone well, a surface surrounding the well, and an inner well surface,wherein the first layer at least partially covers the inner wellsurface; b) depositing at least one first capture primer pair in thefirst layer, wherein the first primer pair includes a plurality of firstcapture primers including a 3′ portion including an Illumina® P5 primernucleotide sequence and a plurality of second capture primers includinga 3′ portion including an Illumina® P7 primer nucleotide sequence; c)producing a second layer on the substrate covering the first layer andthe surface surrounding the well; d) contacting a sample including aplurality of target polynucleotides with the substrate under conditionssufficient for a single target polynucleotide per well to hybridize witha primer of the at least one first capture primer pair, wherein theplurality of polynucleotides are flanked by complementary universalprimer regions each including a complementary Illumina® P5′ primernucleotide sequence or a complementary Illumina® P7′ primer nucleotidesequence; e) performing a first KEA to produce a monoclonal populationof amplicons from the single target polynucleotide inside the at leastone well, thereby amplifying the target polynucleotide; f) depositing atleast one second capture primer pair in the second layer, wherein the atleast one second capture primer pair includes a plurality of firstcapture primers including a 3′ portion including an Illumina® P5 primernucleotide sequence and a plurality of second capture primers includinga 3′ portion including an Illumina® P7 primer nucleotide sequence, andg) performing bridge amplification or a second KEA to enlarge themonoclonal population of amplicons of the single target polynucleotide.Additionally or alternatively, detection of a genetic biomarker caninclude a method for amplifying a nucleic acid, including: a) producinga layer on a substrate, wherein the substrate includes at least onewell, a surface surrounding the well and an inner well surface, whereinthe well has a diameter of about 1 μm or more and wherein the layer atleast partially covers the inner well surface; b) depositing at leastone first capture primer pair and at least one second capture primerpair in the layer, wherein the primer density of the at least one firstcapture primer pair is higher than the primer density of the at leastsecond primer pair; c) contacting a sample including a plurality oftarget polynucleotides with the substrate under conditions sufficientfor a single target polynucleotide per well to hybridize with the secondprimer, and d) performing a KEA to produce a monoclonal population ofamplicons from the single target polynucleotide hybridized to the secondprimer inside the well, thereby amplifying the single targetpolynucleotide. Additionally or alternatively, detection of a geneticbiomarker can include a method for modifying an immobilized captureprimer including: a) contacting a substrate including a plurality ofimmobilized capture primers with a plurality of template nucleic acidsunder conditions sufficient for hybridization to produce one or moreimmobilized template nucleic acids, wherein the plurality of immobilizedcapture primers includes a first plurality of primers including a5′-terminal universal capture region Y and a second plurality of primersincluding a 3′-terminal universal capture region Z, and wherein eachtemplate nucleic acid is flanked by 5′-terminal and a 3′-terminaluniversal capture regions Y or Z and includes one or more restrictionsites and the target-specific capture region between the 5′-terminaluniversal capture region and the one or more restriction sites orbetween the 3′-terminal universal capture region and the one or morerestriction sites, and b) extending one or more immobilized captureprimers to produce one or more immobilized extension productscomplementary to the one or more template nucleic acid. Additionally oralternatively, detection of a genetic biomarker can include a method formodifying an immobilized capture primer including: a) contacting asubstrate including a plurality of immobilized capture primers with aplurality of template nucleic acids under conditions sufficient forhybridization to produce one or more immobilized template nucleic acid,wherein the plurality of immobilized capture primers includes a firstplurality of primers including a 3′-terminal P5 primer nucleotidesequence and a second plurality of primers including a 3′-terminalIllumina® P7 primer nucleotide sequence, and wherein each templatenucleic acid is flanked by a 3′-terminal complementary Illumina® P5′primer nucleotide sequence and a 5′-terminal complementary Illumina® P7′primer nucleotide sequence, and includes two SapI restriction sites, aspacer region between the SapI restriction sites, and a target-specificcapture region between the 3′terminal complementary Illumina® P5′ primernucleotide sequence and the SapI restriction sites; and b) extending oneor more immobilized capture primers to produce one or more immobilizedextension products complementary to the one or more template nucleicacids. c) amplifying the one or more immobilized extension products bybridge amplification or KEA to produce one or more monoclonal clustersof immobilized double-stranded template nucleic acids; d) contacting theone or more monoclonal cluster of immobilized double-stranded templatenucleic acids with SapI to cut the two restriction sites in a pluralityof immobilized double-stranded template nucleic acids to produce aplurality of immobilized double-stranded chimeric capture primersincluding the Illumina® P5 primer nucleotide sequence and thetarget-specific capture region and a plurality of immobilizeddouble-stranded regenerated universal capture primers including theIllumina® P7 primer nucleotide sequence, and e) optionally, contactingthe plurality of immobilized double-stranded chimeric capture primersand immobilized double-stranded regenerated universal capture primerswith a 5′-3′ dsDNA-exonuclease to produce a plurality of immobilizedsingle-stranded chimeric capture primers and a plurality of immobilizedsingle-stranded regenerated universal capture primers.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0356030, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include compositions, systems, and methods fordetecting the presence of polymer subunits using chemiluminescence.Under one aspect, a composition includes a substrate; a firstpolynucleotide coupled to the substrate; a second polynucleotidehybridized to the first polynucleotide; and a catalyst coupled to afirst nucleotide of the second polynucleotide, the catalyst beingoperable to cause a chemiluminogenic molecule to emit a photon. In someembodiments, the composition further includes a plurality of thechemiluminogenic molecules. The catalyst can cause each of thechemiluminogenic molecules to emit a corresponding photon. Thecomposition further can include a plurality of reagent molecules, thecatalyst causing each of the chemiluminogenic molecules to emit acorresponding photon by oxidizing that chemiluminogenic molecule using areagent molecule. The oxidized chemiluminogenic molecule can have anexcited state that decays by emitting the corresponding photon. A systemcan include any of the foregoing compositions and circuitry configuredto detect the photon emitted by the chemiluminogenic molecule. In someembodiments, the circuitry further is configured to detect the presenceof the first nucleotide based on detection of the photon. Under anotheraspect, a method can include providing a substrate; providing a firstpolynucleotide coupled to the substrate; hybridizing a secondpolynucleotide to the first polynucleotide; coupling a first catalyst toa first nucleotide of the second polynucleotide; and causing, by thefirst catalyst, a first chemiluminogenic molecule to emit a photon.Under another aspect, a method of sequencing a first polynucleotideincludes providing the first polynucleotide to be sequenced and coupledto a substrate; b) hybridizing a second polynucleotide to the firstpolynucleotide; and contacting the second polynucleotide with apolymerase and a plurality of nucleotides. A first subset of theplurality of nucleotides includes a first moiety, a second subset of theplurality of nucleotides includes a second moiety, a third subset of theplurality of nucleotides includes a third moiety, and a fourth subset ofthe plurality of nucleotides includes a fourth moiety or no moiety. Themethod further can include adding a nucleotide of the plurality ofnucleotides to the second polynucleotide based on a sequence of thefirst polynucleotide. The method further can include exposing thenucleotide to a catalyst coupled to a fifth moiety; exposing thenucleotide to chemiluminogenic molecules; and detecting emission ofphotons or an absence of photons from the chemiluminogenic molecules.The method further can include exposing the nucleotide to a catalystcoupled to a sixth moiety; exposing the nucleotide to chemiluminogenicmolecules; and detecting emission of photons or an absence of photonsfrom the chemiluminogenic molecules. The method further can includeexposing the nucleotide to a cleaver molecule; exposing the nucleotideto chemiluminogenic molecules; and detecting emission of photons or anabsence of photons from the chemiluminogenic molecules. The methodfurther can include detecting the added nucleotide based on thedetection of emission of photons or absence of photons from thechemiluminogenic molecules at one or more of the detection steps or acombination thereof. Under another aspect, a composition includes acatalyst operable to cause a chemiluminogenic molecule to emit a photon;a substrate; a first polynucleotide coupled to the substrate; a secondpolynucleotide hybridized to the first polynucleotide; and a quenchercoupled to a first nucleotide of the second polynucleotide, the quencheroperable to inhibit photon emission by the chemiluminogenic molecule.Under another aspect, a method includes providing a catalyst operable tocause a first chemiluminogenic molecule to emit a photon; providing asubstrate; providing a first polynucleotide coupled to the substrate;hybridizing a second polynucleotide to the first polynucleotide;coupling a first quencher to a first nucleotide of the secondpolynucleotide; and inhibiting, by the first quencher, photon emissionby the first chemiluminogenic molecule. Under another aspect, a methodof sequencing a first polynucleotide includes providing the firstpolynucleotide to be sequenced and coupled to a substrate; hybridizing asecond polynucleotide to the first polynucleotide; and providing acatalyst coupled sufficiently close to the second polynucleotide that aquencher coupled to the second polynucleotide can inhibit photonemission from chemiluminescent molecules that interact with thecatalyst. The method further can include contacting the secondpolynucleotide with a polymerase and a plurality of nucleotides. A firstsubset of the plurality of nucleotides includes a first moiety, a secondsubset of the plurality of nucleotides includes a second moiety, a thirdsubset of the plurality of nucleotides includes a third moiety, and afourth subset of the plurality of nucleotides includes a fourth moietyor no moiety.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0137876, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method that may be used to substantiallyreduce or eliminate high quality errors that may be generated duringfirst extension. In another embodiment, the methods may also be used toreduce errors caused by mis-incorporation of nucleotides during thefirst few cycles of amplification. Additionally or alternatively,detection of a genetic biomarker can include a method of sequencing withimproved accuracy comprising; providing a nucleic acid template;producing, by linear amplification directly from the template nucleicacid, a population comprising a plurality of complementary strandsretained in close proximity to each other or identifiable as beingobtained from the same template nucleic acid; and performing asequencing reaction on said proximity retained (e.g. surface bound)oligonucleotides. In some embodiments, the method further comprises thestep of carrying out further (exponential) amplification of thepopulation of complementary strands after the rounds of linearamplification and prior to performing the sequencing reaction.Optionally the linear amplification (directly from the nucleic acidtemplate) includes the steps of; hybridising said nucleic acid templateto a first primer; extending the first primer to produce a complementarystrand to the template; denaturing to release the complementary strandwhich remains in close proximity (e.g. it remains bound to the surfaceand thus does not travel at all or does not diffuse far beforere-hybridising nearby); and repeating the hybridisation andamplification steps to produce a population of surface boundcomplementary strands obtained directly from the template nucleic acid.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0101676, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include techniques for enrichment of targetsequences in a nucleic acid library and reducing the capture ofoff-target sequences by a set of target hybridization probes. Becausetarget hybridization probes have imperfect specificity for their nucleicacid targets, a sequencing run using a set of target hybridizationprobes may also include a certain percentage of reads that representsequences that are off-target. For example, in an exome sequencingreaction, certain hybridization probes may pull down intronic orintergenic sequences from a nucleic acid library along with targetsequences. These off-target fragments, once pulled down, are thenpresent in the pool of nucleic acid fragments that are sequenced. Whilethe sequencing information representative of the off-target reads istypically discarded, the present techniques use acquired sequencinginformation of these off-target reads to design hybridization probesthat are specific for the off-target sequences and that are used toseparate and/or remove fragments that include these sequences from thepool of fragments captured by the target-specific hybridization probes.The off-target hybridization probes are designed based on analysis ofthe off-target reads of a hybrid capture sequencing run that isperformed with a set of target hybridization probes. In certainembodiments, the on-target probe design may also be based on systematicoff-target analysis across samples to improve the specificity of thetarget hybridization probes for their desired targets. Additionally oralternatively, detection of a genetic biomarker can include a method ofreducing off-target capture in a targeted sequencing reaction. Themethod includes the steps of providing a set of off-target hybridizationprobes that specifically bind to a plurality of off-target sequencespresent in a nucleic acid library generated from a sample, the nucleicacid library comprising a plurality of nucleic acid fragments andproviding a set of target-specific hybridization probes thatspecifically bind to a plurality of target sequences present in thenucleic acid library. The method also includes the steps of contactingthe off-target hybridization probes with the nucleic acid library underconditions whereby the off-target hybridization probes hybridize to theoff-target sequences and contacting the target-specific hybridizationprobes with the nucleic acid library under conditions whereby thetarget-specific hybridization probes hybridize to the target sequences.The method also includes the steps of selecting a group of nucleic acidfragments from the nucleic acid library bound to the target-specifichybridization probes; and sequencing the group of nucleic acid fragmentsbound to the target-specific hybridization probes. Additionally oralternatively, detection of a genetic biomarker can include a method ofproviding probes for off-target sequence capture in a targetedsequencing reaction. The method includes the steps of receiving arequest for a set of target-specific hybridization probes. The methodalso includes the steps of contacting the target-specific hybridizationprobes with a reference nucleic acid library generated from a referencesample, the nucleic acid library comprising a plurality of nucleic acidfragments, to generate a reference group of target-specific andoff-target nucleic acid fragments bound to the target-specifichybridization probes and separating the reference group of nucleic acidfragments bound to the target-specific hybridization probes from unboundnucleic acid fragments. The method also includes the steps of sequencingthe reference group of nucleic acid fragments to generate referencesequencing data; identifying off-target sequences in the referencesequencing data; and providing a set of off-target hybridization probesbased on the identified off-target sequences. Additionally oralternatively, detection of a genetic biomarker can include a sequencingkit for reducing off-target capture in a targeted sequencing reactionthat includes a set of off-target hybridization probes that specificallybind to a plurality of off-target sequences present in a nucleic acidlibrary generated from a sample, the nucleic acid library comprising aplurality of nucleic acid fragments and a set of target-specifichybridization probes that specifically bind to a plurality of targetsequences present in the nucleic acid library.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2016/0319345, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods, apparatus, systems, and computerprogram products for determining nucleic acid fragment sequences usingunique molecular indices (UMIs). In various implementations, sequencingmethods determine the sequences of nucleic acid fragments from bothstrands of the nucleic acid fragments. In some implementations, themethods employ physical UMIs located on one or both strands ofsequencing adapters. In some implementations, the methods also employvirtual UMIs located on both strands of the nucleic acid fragments. Oneaspect relates to a method for sequencing nucleic acid molecules from asample using unique molecular indices (UMIs). Each unique molecularindex (UMI) is an oligonucleotide sequence that can be used to identifyan individual molecule of a double-stranded DNA fragment in the sample.The method include: (a) applying adapters to both ends ofdouble-stranded DNA fragments in the sample, wherein the adapters eachinclude a double-stranded hybridized region, a single-stranded 5′ arm, asingle-stranded 3′ arm, and a physical UMI on one strand or each strandof the adapters, thereby obtaining DNA-adapter products; (b) amplifyingboth strands of the DNA-adapter products to obtain a plurality ofamplified polynucleotides; (c) sequencing the plurality of amplifiedpolynucleotides, thereby obtaining a plurality of reads each associatedwith a physical UMI; (d) identifying a plurality of physical UMIsassociated with the plurality of reads; (e) identifying a plurality ofvirtual UMIs associated with the plurality of reads, wherein eachvirtual UMI is a sequence found in a DNA fragment in the sample; and (f)determining sequences of the double-stranded DNA fragments in the sampleusing the plurality of reads obtained in (c), the plurality of physicalUMIs identified in (d), and the plurality of virtual UMIs identified in(e). In some implementations, the method include operation (f) includes:(i) combining, for each of one or more of the double-stranded DNAfragments in the sample, (1) reads having a first physical UMI and atleast one virtual UMI in the 5′ to 3′ direction and (2) reads having asecond physical UMI and the at least one virtual UMI in the 5′ to 3′direction to determine a consensus nucleotide sequence; and (ii)determining, for each of the one or more of the double-stranded DNAfragments in the sample, a sequence using the consensus nucleotidesequence. In some implementations, the adapters each include a physicalUMI on only one strand of the adapters on the single-stranded 5′ arm orthe single-stranded 3′ arm. In some of these implementation, (f)includes: (i) collapsing reads having a same first physical UMI into afirst group to obtain a first consensus nucleotide sequence; (ii)collapsing reads having a same second physical UMI into a second groupto obtain a second consensus nucleotide sequence; and (iii) determining,using the first and second consensus nucleotide sequences, a sequence ofone of the double-stranded DNA fragments in the sample. In someimplementations, (iii) includes: (1) obtaining, using localizationinformation and sequence information of the first and second consensusnucleotide sequences, a third consensus nucleotide sequence, and (2)determining, using the third consensus nucleotide sequence, the sequenceof one of the double-stranded DNA fragments. In some implementations,operation (e) includes identifying the plurality of virtual UMIs, whilethe adapters each include the physical UMI on only one strand of theadapters in the single-stranded 5′ arm region or the single-stranded 3′arm region. In some implementations, (f) includes: (i) combining readshaving a first physical UMI and at least one virtual UMI in the 5′ to 3′direction and reads having a second physical UMI and the at least onevirtual UMI in the 5′ to 3′ direction to determine a consensusnucleotide sequence; and (ii) determining a sequence of one of thedouble-stranded DNA fragments in the sample using the consensusnucleotide sequence. In some implementations of the methods above,obtaining the plurality of reads in operation (c) includes: obtainingtwo pair-end reads from each of the amplified polynucleotides, where inthe two pair-end reads include a long read and a short read, the longread being longer than the short read. In some of these implementations,operation (f) includes: combining read pairs associated with a firstphysical UMI into a first group and combining read pairs associated witha second physical UMI into a second group, wherein the first and thesecond physical UMIs are uniquely associated with a double-strandedfragment in the sample; and determining the sequence of thedouble-stranded fragment in the sample using sequence information oflong reads in the first group and sequence information of long reads inthe second group. Another aspect adapters to both ends ofdouble-stranded DNA fragments in the sample, wherein the adapters eachinclude a double-stranded hybridized region, a single-stranded 5′ arm, asingle-stranded 3′ arm, and a physical unique molecular index (UMI) onthe single-stranded 5′ arm or the single-stranded 3′ arm; (b) amplifyingboth strands of ligation products from (a), thereby obtaining aplurality of single-stranded, amplified polynucleotides; (c) sequencingthe plurality of amplified polynucleotides, thereby obtaining aplurality of reads each associated with a physical UMI; (d) identifyinga plurality of physical UMIs associated with the plurality of reads; and(e) determining sequences of the double-stranded DNA fragments in thesample using the plurality of sequences obtained in (c) and theplurality of physical UMIs identified in (d). An additional aspectrelates to a method for sequencing nucleic acid molecules from a sample.The method includes: (a) attaching adapters to both ends ofdouble-stranded DNA fragments in the sample, wherein the adapters eachinclude a double-stranded hybridized region, a single-stranded 5′ arm, asingle-stranded 3′ arm, and a physical unique molecular index (UMI)shorter than 12 nucleotides on one strand or each strand of theadapters; (b) amplifying both strands of ligation products from (a),thereby obtaining a plurality of single-stranded, amplifiedpolynucleotides each including a physical UMI; (c) sequencing theplurality of amplified polynucleotides, thereby obtaining a plurality ofreads each associated with a physical UMI; (d) identifying a pluralityof physical UMIs associated with the plurality of reads; and (e)determining sequences of the double-stranded DNA fragments in the sampleusing the plurality of reads obtained in (c) and the plurality ofphysical UMIs identified in (d). Another aspect relates a method formaking a duplex sequencing adapter having a physical UMI on each strand.The method includes: providing a preliminary sequencing adapterincluding a double-stranded hybridized region, two single-stranded arms,and an overhang including 5′-CCANNNNANNNNTGG-3′ at an end of thedouble-stranded hybridized region that is further away from the twosingle stranded arms; extending one strand of the double-strandedhybridized region using the overhang as a template, thereby producing anextension product; and applying restriction enzyme Xcm1 to digest adouble-stranded end of the extension product, thereby producing theduplex sequencing adapter having a physical UMI on each strand. In someimplementations, the preliminary sequencing adapter includes a readprimer sequence on each strand. A further aspect relates to a computerprogram product including a non-transitory machine readable mediumstoring program code that, when executed by one or more processors of acomputer system, causes the computer system to implement a method fordetermining sequence information of a sequence of interest in a sampleusing unique molecular indices (UMIs). The program code includes: (a)code for obtaining reads of a plurality of amplified polynucleotides,wherein the plurality of amplified polynucleotides are obtained byamplifying double-stranded DNA fragments in the sample including thesequence of interest and attaching adapters to the double-stranded DNAfragments; (b) code for identifying a plurality of physical UMIs in thereads of the plurality of amplified polynucleotides, wherein eachphysical UMI is found in an adapter attached to one of thedouble-stranded DNA fragments; (c) code for identifying a plurality ofvirtual UMIs in the received reads of the plurality of amplifiedpolynucleotides, wherein each virtual UMI is found in an individualmolecule of one of the double-stranded DNA fragments; and (c) code fordetermining sequences of the double-stranded DNA fragments using thereads of the plurality of amplified polynucleotides, the plurality ofphysical UMIs, and the plurality of virtual UMIs, thereby reducingerrors in the determined sequences of the double-stranded DNA fragments.In some implementations, the adapters each include a double-strandedhybridized region, a single-stranded 5′ arm, a single-stranded 3′ arm,and a physical unique molecular index (UMI) on one strand or each strandof the adapters. An additional aspect relates to a computer system,including: one or more processors; system memory; and one or morecomputer-readable storage media. The media has stored thereoncomputer-executable instructions that causes the computer system toimplement a method to determine sequence information of a sequence ofinterest in a sample using unique molecular indices (UMIs), which areoligonucleotide sequences that can be used to identify individualmolecules of double-stranded DNA fragments in the sample. Theinstructions includes: (a) receiving reads of a plurality of amplifiedpolynucleotides, wherein the plurality of amplified polynucleotides areobtained by amplifying double-stranded DNA fragments in the sampleincluding the sequence of interest and attaching adapters to thedouble-stranded DNA fragments; (b) identifying a plurality of physicalUMIs in the received reads of the plurality of amplifiedpolynucleotides, wherein each physical UMI is found in an adapterattached to one of the double-stranded DNA fragments; (c) identifying aplurality of virtual UMIs in the received reads of the plurality ofamplified polynucleotides, wherein each virtual UMI is found in anindividual molecule of one of the double-stranded DNA fragments; and (d)determining sequences of the double-stranded DNA fragments using thesequences of the plurality of amplified polynucleotides, the pluralityof physical UMIs, and the plurality of virtual UMIs, thereby reducingerrors in the determined sequences of the double-stranded DNA fragments.One aspect provides methods for sequencing nucleic acid molecules from asample using nonrandom unique molecular indices (UMIs). The methodsinvolve: (a) applying adapters to both ends of DNA fragments in thesample, wherein the adapters each include a double-stranded hybridizedregion, a single-stranded 5′ arm, a single-stranded 3′ arm, and anonrandom unique molecular index (UMI) on one strand or each strand ofthe adapters, thereby obtaining DNA-adapter products; (b) amplifying theDNA-adapter products to obtain a plurality of amplified polynucleotides;(c) sequencing the plurality of amplified polynucleotides, therebyobtaining a plurality of reads associated with a plurality of nonrandomUMIs; (d) from the plurality of reads, identifying reads sharing acommon nonrandom UMI; and (e) from the identified reads sharing thecommon nonrandom UMI, determining the sequence of at least a portion ofa DNA fragment, from the sample, having an applied adaptor with thecommon non-random UMI. Another aspect relates to methods for sequencingnucleic acid molecules from a sample using nonrandom unique molecularindices (UMIs). In some implementations, a method involves: (a) applyingadapters to both ends of double-stranded DNA fragments in the sample,wherein the adapters each include a double-stranded hybridized region, asingle-stranded 5′ arm, a single-stranded 3′ arm, and a nonrandom uniquemolecular index (UMI) on one strand or each strand of the adapters,thereby obtaining DNA-adapter products, wherein the nonrandom UMI can becombined with other information to uniquely identify an individualmolecule of the double-stranded DNA fragments; (b) amplifying bothstrands of the DNA-adapter products to obtain a plurality of amplifiedpolynucleotides; (c) sequencing the plurality of amplifiedpolynucleotides, thereby obtaining a plurality of reads each associatedwith a nonrandom UMI; (d) identifying a plurality of nonrandom UMIsassociated with the plurality of reads; and (e) using the plurality ofreads and the plurality of nonrandom UMIs to determine sequences of thedouble-stranded DNA fragments in the sample. Additionally oralternatively, detection of a genetic biomarker can include a system,apparatus, and computer program products for determining DNA fragmentsequences implementing the methods disclosed. One aspect provides acomputer program product including a non-transitory machine readablemedium storing program code that, when executed by one or moreprocessors of a computer system, causes the computer system to implementa method to determine sequence information of a sequence of interest ina sample using unique molecular indices (UMIs). The program codeincludes instructions to perform the methods above.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2015/0360193, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods, compositions and kits for theamplification of nucleic acid samples to generate nucleic acidlibraries. Additionally or alternatively, detection of a geneticbiomarker can include a method of creating a nucleic acid library from anucleic acid sample, the method comprising: a) providing a set ofamplification primers to a nucleic acid sample, the set of amplificationprimers comprising a plurality of random primers and a plurality oflocus specific primers, wherein the locus specific primers areconfigured to amplify a plurality of predetermined regions of thenucleic acid library, and wherein the random primers are in greaterabundance compared to the locus specific primers; and b) amplifying thenucleic acid library using the set of amplification primers, therebycreating a nucleic acid library. Also presented is a kit for amplifyinga nucleic acid sample, wherein the kit comprises a plurality of randomprimers and a plurality of locus specific primers configured to amplifya plurality of predetermined regions of a nucleic acid library. Incertain aspects, the kit further comprises a set of instructions forusing the random primers and the locus specific primers in anamplification reaction set, wherein the random primers are in greaterabundance compared to the locus specific primers. In certain aspects,the kit further comprises a set of instructions for combining the set ofamplification primers with a nucleic acid library and amplifying thenucleic acid library. In addition to the foregoing method, alsopresented is a method of creating a nucleic acid library from a nucleicacid sample, the method comprising: a) amplifying a nucleic acid samplewith an AT-rich set of random amplification primers. In certain aspects,the AT-rich set of random amplification primers is a mixture of primers.Also presented is a kit for amplifying a nucleic acid sample, whereinthe kit comprises an AT-rich set of random amplification primers. Incertain aspects, the kit further comprises a set of instructions forcombining the set of amplification primers with a nucleic acid libraryand amplifying the nucleic acid library. In certain other aspects, thekit further comprises a DNA polymerase. In still other aspects, theAT-rich set of random amplification primers is a mixture of primers.Additionally or alternatively, detection of a genetic biomarker caninclude a method of creating a nucleic acid library from a nucleic acidsample, the method comprising: a) amplifying a nucleic acid sample witha set of random amplification primers, the random amplification primerscomprising AT-rich 5′ tails. In certain aspects, the set of randomamplification primers is a mixture of primers. Also presented is amethod of creating a nucleic acid library from a nucleic acid sample,the method comprising: amplifying a nucleic acid sample with a set ofvariable-length random amplification primers, wherein eachvariable-length random amplification primer comprises a random 3′portion and a degenerate 5′ tail, the degenerate 5′ tail beingproportional in length to the A/T content of the random 3′ portion ofthe primer. In certain aspects, the set of variable-length randomamplification primers is a mixture of primers. Also presented is amethod of creating a nucleic acid library from a nucleic acid sample,the method comprising: a) amplifying a nucleic acid sample with a set ofrandom amplification primers, wherein each primer comprises a random 3′portion and a constant 5′ priming portion, thereby producingamplification products, wherein each amplification product comprises theconstant 5′ priming portion; b) circularizing the amplificationproducts; and c) amplifying the circularized amplification productsusing primers which hybridize to the constant 5′ priming portion. Incertain aspects, the amplifying in step (c) comprises performingmultiple displacement amplification. In certain aspects, the set ofrandom amplification primers is a mixture of primers.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2015/0176071, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include determining proximity of sequencefragments with respect to a larger target nucleic acid from which thefragments were derived. For example, the methods can be used todetermine phasing and to identify haplotypes for a relatively longtarget nucleic acid sequence when individual sequence reads are shorterthan the length of the target nucleic acid under evaluation.Additionally or alternatively, detection of a genetic biomarker caninclude a method of sequencing a target nucleic acid polymer. The methodcan include the steps of (a) modifying a target nucleic acid polymer toproduce a modified nucleic acid polymer, wherein the modified nucleicacid polymer includes a plurality of sequence regions from the targetnucleic acid polymer; (b) producing fragments of the modified nucleicacid polymer in a vessel having a solid support surface, each fragmentcomprising one of the sequence regions; (c) capturing the fragmentsrandomly at locations in a region of the solid support surface; (d)determining nucleotide sequences of the sequence regions by detectingthe fragments at the locations; and (e) producing a representation ofthe nucleotide sequence for the target nucleic acid polymer based on thenucleotide sequences from the fragments and the relative distancesbetween the locations on the solid support surface. Additionally oralternatively, detection of a genetic biomarker can include a method ofsequencing a target nucleic acid polymer that includes the steps of (a)adding inserts into a target nucleic acid polymer to form a modifiednucleic acid polymer including a plurality of internal inserts; (b)producing fragments of the modified nucleic acid polymer in a fluid thatis in contact with a solid support surface, thereby releasing fragmentsthat each include at least a portion of the inserts; (c) capturing thefragments from the fluid randomly at locations on a solid supportsurface; (d) determining nucleotide sequences from the fragments bydetecting the fragments at the locations; and (e) producing arepresentation of the nucleotide sequence for the target nucleic acidpolymer based on the nucleotide sequences from the fragments and therelative distances between the locations on the solid support surface.Additionally or alternatively, detection of a genetic biomarker caninclude a method of sequencing a target nucleic acid polymer, thatincludes the steps of (a) modifying a target nucleic acid polymer toproduce a modified nucleic acid polymer, wherein the modified nucleicacid polymer includes a plurality of sequence regions from the targetnucleic acid polymer; (b) attaching the modified nucleic acid polymer toa region on a solid support surface; (c) producing fragments of themodified nucleic acid polymer that is attached to the solid supportsurface, wherein the fragments are attached to locations at the regionof the solid support surface; (d) determining nucleotide sequences fromthe fragments by detecting the fragments at the locations; and (e)producing a representation of the nucleotide sequence for the targetnucleic acid polymers based on the nucleotide sequences from thefragments and the relative distances between the locations on the solidsupport surface. Additionally or alternatively, detection of a geneticbiomarker can include a method of determining the source for individualsequences in a mixture of sequences from different sources. The methodcan include the steps of (a) providing a mixture of target nucleic acidpolymers from a plurality of different sources; (b) modifying themixture of target nucleic acid polymers to produce a mixture of modifiednucleic acid polymers, wherein the mixture of modified nucleic acidpolymers includes a plurality of sequence regions from the differentsources; (c) producing fragments of the modified nucleic acid polymersin a vessel having a solid support surface, each fragment comprising asequence region from a single one of the different sources; (d)capturing the fragments randomly at locations of the solid supportsurface, under conditions wherein fragments from a common target nucleicacid polymer preferentially localize to proximal locations on the solidsupport surface; (e) determining nucleotide sequences of the fragmentsat the locations; and (f) identifying the nucleotide sequences that arederived from a common source in the plurality of different sources basedon the nucleotide sequences from the fragments and the relativedistances between the locations on the solid support surface.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2014/0364323, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods for determining the presence of aplurality of nucleotide sequences of interest in a plurality of samples,while preserving the identity of each sample. The method can be used inmany applications, including genotyping, expression analysis, andidentification of individual species in complex samples. In oneembodiment, each sample is contacted with a plurality of probe sets. Afirst probe has a first identification sequence and a firsthybridization sequence complementary to a first portion of the sequenceof interest. A second probe has a second hybridization sequencecomplementary to a second portion of the same sequence of interest and asecond identification sequence. If the first hybridization sequence ishybridized to the first portion of the sequence of interest, and thesecond hybridization sequence is hybridized to the second portion of thesame sequence of interest, then the first and second probes are joined.This can also be performed using ligation and/or extension methods, suchas with a GoldenGate® assay design. The presence of the sequence ofinterest and the identity of the sample containing the sequence ofinterest are determined, based on identification sequence codes presentin the joined probes.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2013/0059741, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include compositions and methods for assaying thepresence of a target analyte in a sample using a solid support.Additionally or alternatively, detection of a genetic biomarker caninclude a solid support having a binding protein, such as an antibody,antibody fragment or protein receptor, immobilized to the solid supportand at least two separate nucleic acid primers immobilized near thebinding protein. Additionally or alternatively, detection of a geneticbiomarker can include a solid support wherein a binding complex isformed between the binding protein immobilized to the solid support, atarget analyte and a second binding protein. In some embodiments, asolid support is provided wherein such a binding complex further forms ahybridization complex between one nucleic acid primer immobilized on thesolid support and an oligonucleotide tag linked to the second bindingprotein. Additionally or alternatively, detection of a genetic biomarkercan include an array that in includes a plurality of these solidsupports. In some embodiments, solid supports can be used in a methodfor detecting numerous target analytes. In one embodiment, the methodfor detecting a target analyte includes providing a solid support havinga binding protein immobilized to the solid support and a second bindingprotein provided in solution, wherein the first binding proteinrecognizes and is capable of binding a target analyte in the presence ofthe second binding protein, which also recognizes and binds the sametarget analyte, contacting the solid support with target analyte and thesecond binding protein under sufficient conditions to allow formation ofa binding complex between the target analyte and both the first andsecond binding proteins, hybridizing the oligonucleotide tag linked tothe second binding protein to a first nucleic acid primer immobilized onthe solid support, extending this first primer whereby a complement ofthe oligonucleotide tag is generated, amplifying the newly generatedcomplement using a second nucleic acid primer immobilized to the solidsupport and detecting the presence of the amplicon, wherein the presenceof the amplicon indicates the presence of the target analyte.Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting a target analyte, wherein the methoddescribed above alternatively proceeds following the extension step byhybridizing the complement of the oligonucleotide tag that is generated,to a second nucleic acid primer immobilized on the solid support forminga second hybridization complex, then extending the second nucleic acidprimer with at least one labeled nucleic acid residue, using methodssuch as single base extension or sequencing by synthesis, wherein thenucleic acid residue added to the primer is dependent on the nucleicacid sequence of the oligonucleotide tag, followed by detecting thepresence of the labeled nucleic acid residue on the solid surface,wherein the presence of the labeled nucleic acid residue indicates thepresence of the target analyte.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2012/0156753, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods for 5′ ligation tagging ofuncapped RNA in a sample that has a 5′ polyphosphate group, comprising:(A) providing: (i) a sample that contains uncapped RNA that has a 5′polyphosphate group, including wherein the sample additionally containsRNA that has a 5′ monophosphate group and/or capped RNA and/or RNA thathas a 5′ hydroxyl group; (ii) RNA 5′ polyphosphatase; (iii) an acceptoroligonucleotide that exhibits a tag; and (iv) RNA ligase; (B) contactingthe sample with the RNA 5′ polyphosphatase under conditions and forsufficient time wherein the uncapped RNA that has a 5′ polyphosphategroup is converted to RNA that has a 5′ monophosphate group; and (C)contacting the sample from step (B) with the acceptor oligonucleotideand the RNA ligase under conditions and for sufficient time wherein the3′ end of the acceptor oligonucleotide is ligated to RNA that has a 5′monophosphate group but not to the capped RNA and 5′-ligation-tagged RNAis generated. In other embodiments, the sample provided in step (A)additionally contains RNA that has a 5′ monophosphate group but theacceptor oligonucleotide is only ligated to the RNA that has a 5′monophosphate group which was converted from the uncapped RNA that has a5′ polyphosphate group in step (B) and is not ligated to the RNA thathas a 5′ monophosphate group already in the sample provided in step (A),wherein the method additionally comprises the substeps of: providing anRNA 5′ monophosphatase; and, prior to step (B), contacting the samplewith the RNA 5′ monophosphatase under conditions and for sufficient timewherein RNA in the sample that has a 5′ monophosphate group is convertedto RNA that has a 5′ hydroxyl group; and inactivating or removing theRNA 5′ monophosphatase. In other embodiments, the method additionallycomprises 5′ ligation tagging of the capped RNA in the sample, whereinthe method additionally comprises the substeps of: providing a nucleicacid pyrophosphatase or decapping enzyme; and, prior to step (C),contacting the sample from step (B) with the nucleic acidpyrophosphatase or the decapping enzyme under conditions and forsufficient time wherein capped RNA in the sample is converted to RNAthat has a 5′ monophosphate group, whereby the capped RNA contained inthe sample provided in step (A) is also 5′-ligation tagged in step (C).

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2012/0010091, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method for preparing a cDNA library froma plurality of single cells. In one aspect, the method includes thesteps of releasing mRNA from each single cell to provide a plurality ofindividual mRNA samples, synthesizing a first strand of cDNA from themRNA in each individual mRNA sample and incorporating a tag into thecDNA to provide a plurality of tagged cDNA samples, pooling the taggedcDNA samples and amplifying the pooled cDNA samples to generate a cDNAlibrary having double-stranded cDNA. In some embodiments, a cDNA librarycan be produced by the above methods. Additionally or alternatively,detection of a genetic biomarker can include methods for analyzing geneexpression in a plurality of cells by preparing a cDNA library asdescribed above and sequencing the library.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2011/0152111, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method of detecting a target nucleicacid sequence in an archived tissue sample comprising providing anucleic acid sample prepared from an archived tissue sample, hybridizinga first set of ligation probes to said target sequence to form aligation structure, ligating said probes using a ligase to form aligated probe, amplifying said ligated probe to form amplicons, anddetecting said amplicons. Additionally or alternatively, detection of agenetic biomarker can include a method of detecting a plurality oftarget nucleic acid sequences in an archived tissue sample comprisingproviding a nucleic acid sample prepared from an archived tissue sample,said sample comprising a plurality of target nucleic acid sequences,adding a plurality of detection probes, each substantially complementaryto one of said target nucleic acid sequences, providing an enzyme toform modified detection probes, amplifying said modified detectionprobes to form amplicons and detecting said amplicons. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetecting a plurality of target nucleic acid sequences in an archivedtissue sample comprising providing a nucleic acid sample prepared froman archived tissue sample, said sample comprising a plurality of targetnucleic acid sequences, hybridizing a plurality of sets of ligationprobes to said target sequence to form a plurality of ligationstructures, ligating each of said plurality of ligation structures usinga ligase to form a plurality of ligated probes, amplifying said ligatedprobes to form a plurality of amplicons and detecting said amplicons asan indication of the presence of said plurality of target nucleic acidsequences.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/019456, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods and compositions that facilitate thecharacterization of transcriptomes and/or genomic variation in tissueswhile preserving spatial information related to the origin of targetnucleic acids in the tissue. For example, the methods can enable theidentification of the location of a cell or a cell cluster in a tissuebiopsy that carries an aberrant mutation. The methods can therefore beuseful for diagnostic purposes, e.g., for the diagnosis of cancer, andpossibly aid in the selection of targeted therapies. Additionally oralternatively, detection of a genetic biomarker can include a capturearray for spatial detection and analysis of nucleic acids in a tissuesample, comprising a capture site comprising a pair of capture probesimmobilized on a surface, wherein a first capture probe of the pair ofcapture probes comprises a first primer binding region and a spatialaddress region, and wherein a second capture probe of the pair ofcapture probes comprises a second primer binding region and a captureregion. Additionally or alternatively, detection of a genetic biomarkercan include a method for spatial detection and analysis of nucleic acidsin a tissue sample that includes (a) providing a capture array,comprising a capture site comprising a pair of capture probesimmobilized on a surface, wherein a first capture probe of the pair ofcapture probes comprises a first primer binding region and a spatialaddress region, and wherein a second capture probe of the pair ofcapture probes comprises a second primer binding region and a captureregion. Additionally or alternatively, detection of a genetic biomarkercan include a method for spatial detection and analysis of nucleic acidsin a tissue sample that includes providing a magnetic nanoparticlecomprising an immobilized capture probe comprising a capture region.Additionally or alternatively, detection of a genetic biomarker caninclude a capture array for spatial detection and analysis of nucleicacids in a tissue sample, comprising a capture site comprising a captureprobe comprising a spatial address region, and a transposon end (TE)region.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2016/130704, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include to methods and compositions relating to evaluatingcomponents of a single cell preserved or embedded or contained within acontiguity preserving elements(CE). In one aspect are methods foranalyzing plurality of analyte types from a single cell. In someembodiments, a plurality of contiguity preserving elements (CE) areprovided, each CE comprises a single cell. The cells are lysed withinthe CE such that the plurality of analytes within the single cell arereleased within the CE. In some embodiments, plurality of types ofreporter moieties are provided such that each type of reporter moiety isspecific for each type of analyte. In some embodiments, the reportermoiety identify a single cell. The plurality of analytes are modifiedsuch that each type of analyte comprise a reporter moiety specific forthe analyte type. In some embodiments, the CE comprising the analytescomprising said reporter moieties are combined. In some embodiments, thecombined CE comprising the analytes comprising said reporter moietiesare compartmentalized. In some embodiments, additional reporter moietiesare provided and combined with the analytes comprising analytes suchthat the analytes comprise two or more different reporter moieties. Theanalytes comprising the reporter moieties are analyzed such that theidentity of the analyte is detected and the reporter moiety identifiesthe source of the analyte from a single cell. In some embodiments, theexemplary plurality of analytes include but are not limited to DNA, RNA,cDNA, protein, lipids, carbohydrates, cellular organelles, (e.g.,nucleus, Golgi apparatus, ribosomes, mitochondria, endoplasmicreticulum, chloroplast, cell membrane, etc.), cellular metabolites,tissue sections, cells, single cell, contents from cells or from asingle cell, nucleic acid isolated from cells or from a single cell, ornucleic acid isolated from cells or from a single cell and furthermodified, or cell free DNA (e.g., from placental fluid or plasma). Insome embodiments, the plurality of analytes include genomic DNA andmRNA. In some embodiments, the mRNA have poly A tail. In someembodiments, the genomic DNA and the mRNA are immobilized on a solidsupport within the CE simultaneously. In some embodiments, theimmobilization of the genomic DNA is sequential to the immobilization ofthe mRNA to the solid support. In some embodiments, the genomic DNA iscombined with transposome complexes and the transposon ends areimmobilized on a solid support and the mRNA are immobilized to the solidby hybridization of oligo (dT) probes immobilized on a solid support. Insome embodiments, the genomic DNA is combined with transposome complexesand, optionally, the transposon ends hybridize to complementarysequences immobilized on a solid support such that the mRNA areimmobilized to the solid by hybridization of oligo (dT) probesimmobilized on a solid support. Other methods can be used to immobilizethe mRNA as well. In some embodiments, the solid support is a bead. Insome embodiments, the solid support is a flow cell surface. In someembodiments, the solid surface is the wall of a reaction vessel. In someembodiments, the methods include sequencing nucleic acids preserved orembedded or contained within CE. Some embodiments relate to preparingDNA within CE to obtain phasing and sequence assembly information from atarget nucleic acid, and obtaining phasing and sequence assemblysequence information from such templates. Particular embodiments relateto the use of integrases, for example transposases, to maintain physicalproximity of associated ends of fragmented nucleic acids; and to the useof combinatoric indexing to create individual libraries from each CE.Obtaining haplotype information from CE includes distinguishing betweendifferent alleles (e.g., SNPs, genetic anomalies, etc.) in a targetnucleic acid. Such methods are useful to characterize different allelesin a target nucleic acid, and to reduce the error rate in sequenceinformation.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2002/012897, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include array compositions comprising a substrate with asurface comprising discrete sites, at least one fiducial, and apopulation of microspheres comprising at least a first and a secondsubpopulation. Each subpopulation comprises a bioactive agent, and themicrospheres are distributed on said surface. Each subpopulation mayoptionally comprise a unique optical signature, an identifier bindingligand that will bind a decoder binding ligand such that theidentification of the bioactive agent can be elucidated, or both. In anadditional aspect, compositions comprising a computer readable memory todirect a computer to function in a specified manner are provided. Thecomputer readable memory comprises an acquisition module for receiving adata image of a random array comprising a plurality of discrete sites, aregistration module for registering a data image, and a comparisonmodule for comparing registered data images. Each module comprisescomputer code for carrying out its function. The registration module mayutilize any number of fiducials, including a fiducial, fiber when thesubstrate comprises a fiber optic bundle, a fiducial microsphere, or afiducial template generated from the random array. In some embodiments,methods of making the array compositions comprise forming a surfacecomprising individual sites on a substrate, distributing microspheres onthe surface such that the individual sites contain microspheres, andincorporating at least one fiducial onto the surface are provided. Whenthe array has complete rotational freedom, at least two fiducials arepreferred in the array to allow for correction of rotation. Additionallyor alternatively, detection of a genetic biomarker can include methodsfor comparing separate data images of a random array. The methodscomprise using a computer system to register a first data image of therandom array to produce a registered first data image, using thecomputer system to register a second data image of the random array toproduce a registered second data image, and comparing the first and thesecond registered data images to determine any differences between them.Some embodiments provide methods of decoding a random array compositioncomprising providing a random array composition. A first plurality ofdecoding binding ligands is added to the array composition and a firstdata image is created. A fiducial is used to generate a first registereddata image. A second plurality of decoding binding ligands is added tothe array composition and a second data image is created. The fiducialis used to generate a second registered data image. A computer system isused to compare the first and the second registered data image toidentify the location of at least two bioactive agents. Some embodimentsprovide methods of determining the presence of a target analyte in asample. The methods comprise acquiring a first data image of a randomarray composition, and registering the first data image to create aregistered first data image. The sample is then added to the randomarray and a second data image is acquired from the array. The seconddata image is registered to create a registered second data image. Thenthe first and the second registered data images are compared todetermine the presence or absence of the target analyte. Optionally, thedata acquisition may be at different wavelengths. Some embodimentsprovide methods for preprocessing or prefiltering signal data comprisingacquiring a data image from an array, and determining the similarity ofa first signal from at least one array site to a reference signal todetermine whether the site comprises a candidate bead.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/136416, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods and systems for identifying splicevariants. In one implementation, a method comprises: determining one ormore sample splice junctions from a plurality of RNA sequence reads froma single biological sample; retrieving, a set of baseline splicejunctions determined from a plurality of healthy RNA samples; comparingthe one or more sample splice junctions to the set of baseline splicejunctions; and identifying one or more filtered sample splice junctions,the filtered sample splice junctions comprising sample splice junctionsthat do not overlap with the baseline splice junctions, wherein the oneor more filtered sample splice junctions are candidate oncogenic events.Some embodiments further comprise outputting the list of candidateoncogenic events. In some embodiments, the plurality of healthy RNAsamples comprises healthy RNA samples taken from a cross section of oneor more of: geographical regions, ages, genders, ethnic groups, tissuetypes, or sample preservation qualities type. In some embodiments, theplurality of healthy RNA samples comprises samples from one or moretissue types selected from the group consisting of: lung, adrenal gland,bladder, breast, ovary, liver, prostate, skin, and spleen. In someembodiments, the plurality of healthy RNA samples comprises samples fromdonors across a range of ages. In some embodiments, the baseline splicejunctions from the plurality of healthy RNA samples are determined priorto the determining the sample junctions from the single sample. In someembodiments, the plurality of healthy RNA samples for the base linesplice junctions are not obtained from the same biological object as thesingle biological sample. In some embodiments, the baseline junctionsare from a same genomic region as the sample junctions. In someembodiments, the single biological sample is from a tumor sample. Insome embodiments, the sample splice junctions and the baseline splicejunctions are both determined using a common assay. In some embodiments,determining the one or more sample junctions comprises: determining theplurality of RNA sequence reads from the single biological sample;retrieving, a DNA reference sequence aligned with the RNA sequence readsfrom the single biological sample; and determining one or more samplejunctions as missing contiguous locations in the RNA read compared withthe DNA reference. In some embodiments, the filtered sample splicejunctions do not overlap with third party junctions, the third partyjunctions determined from a splice graph that captures multiplealternate combinations of exons for a given gene. In some embodiments,the set of baseline splice junctions are determined without determininga splice graph that captures multiple alternate combinations of exonsfor a given gene. Some embodiments provide a system for identifyingsplice variants. The system includes a memory, at least one processor;and at least one non-transitory computer-readable medium containinginstructions that, when executed by the at least one processor, causethe at least one processor to perform operations comprising determiningone or more sample splice junctions from a plurality of RNA sequencereads from a single biological sample; retrieving, a set of baselinesplice junctions determined from a plurality of healthy RNA samples;comparing the one or more sample splice junctions to the set of baselinesplice junctions; and identifying one or more filtered sample splicejunctions, the filtered sample splice junctions comprising sample splicejunctions that do not overlap with the set of baseline splice junctions,wherein the filtered sample splice junctions are candidate oncogenicevents.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/093780, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a computer implemented method for validatingvariant calls. The method operates under control of one or moreprocessors executing program instructions for, receiving sequencing dataincluding a sample read that has a corresponding sequence of nucleotidesalong the genomic sequence of interest, receiving an indication of apotential variant call at a designated position within the sequence ofnucleotides along the genomic sequence of interest, and obtainingbaseline variant frequencies at the designated position within one ormore baseline genomic sequences. The method obtains a sample variantfrequency at the designated position for the genomic sequence ofinterest. The method analyzes the baseline and sample variantfrequencies at the designated position to obtain a quality score; andvalidates the potential variant call for the genomic sequence ofinterest based on the quality score. Optionally, the analyzing operationincludes obtaining a relation between the sample variant frequency and adistribution of the baseline variant frequencies, the quality scorebased on the relation. Optionally, the analyzing operation comprisesindexing the sample variant frequency with respect to a distribution ofthe baseline variant frequencies. The relation may be based on anon-parametric Wilcoxon rank sum test. The baseline variant frequenciesindicate a degree of background noise at corresponding positions alongthe baseline genomic sequence. Optionally, the validating furthercomprises comparing the quality score to a threshold; and declaring thepotential variant call to be a valid variant call when the quality scoreexceeds the threshold. The baseline variant frequencies may be derivedfrom multiple baseline genomic sequences that are associated with morethan one type of allele. Optionally, the method further comprisesreceiving sequencing data that includes a plurality of reference readsof a sequence of nucleotides along the baseline genomic sequence, anddetermining the baseline variant frequencies for the reference reads atthe designated positions. The determining of the baseline variantfrequencies may further comprise receiving the sequencing data from thereference reads for a set of positions within a current base pairwindow; identifying a candidate variant frequency for one or morepositions in the set of positions within the current base pair window;selecting one of the candidate variant frequencies as the baselinevariant frequency for a designated position within the reference read;and shifting the base pair window along the baseline genomic sequenceand repeating the operations. In accordance with the above embodiments,systems and methods are described to reduce false positive variantcalling from systematic errors. Systematic errors may arise due tovarious factors such as FFPE artifacts, sequencing errors, librarypreparation errors, PCR errors and the like. Variant calls arestatically subjected to a locus specific background error distributionthat may be compiled from a panel of FFPE normal samples with varied DNAquality from various tissues sequenced by the NGS-based assay. The samesequencing data of the FFPE normal samples may also be utilized tonormalize systematic bias in read coverage caused by PCR, DNA quality,probe pull-down efficiency, or sequence GC content to reveal the truecopy number alterations in a test sample. To further enlarge the signalto noise ratio in CNV calling, additional enhancer probes may be addedin the hybrid capture to provide robust estimation of geneamplification. Additionally or alternatively, detection of a geneticbiomarker can include methods and systems that address noise problemsand prevent systematic errors from contributing to false positivevariant calls. In connection there with, a set of normal samples is usedto identify systematic bias in order for the system to increase thecalling stringency in tumor samples in regions with high backgroundnoise. For FFPE samples, normal FFPE samples may be used to constructthe baseline. For ctDNA samples, normal genomic DNA data may be used toconstruct the baseline.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/068014, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include systems and methods for sequencingpolynucleotides. In one embodiment, the system comprises: a memorycomprising a reference nucleotide sequence; a processor configured toexecute instructions that perform a method comprising: receiving a firstnucleotide subsequence of a read from a sequencing system; processingthe first nucleotide subsequence using a first alignment path todetermine a first plurality of candidate locations of the read on thereference sequence; determining whether the first nucleotide subsequencealigns to the reference sequence based on the determined candidatelocations; receiving a second nucleotide subsequence from the sequencingsystem; processing the second nucleotide subsequence to determine asecond plurality of candidate locations of the read that align to thereference sequence using: a second alignment path if the read is alignedto the reference sequence, and the first alignment path if otherwise,wherein the second alignment path is more computationally efficient thanthe first alignment path to determine the second plurality of candidatelocations of the read. In one embodiment, the method comprises:receiving a first nucleotide subsequence from a sequencing system duringa sequencing run; and performing a secondary analysis of the firstnucleotide subsequence of a read based on a reference sequence using afirst analysis path or a second analysis path, wherein the secondanalysis path is more computationally efficient than the firstprocessing path in performing the secondary analysis.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/057770, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include detection of copy number variations in abiological sample. In one embodiment, copy number variants may be atleast a single gene in size. In another embodiment, copy number variantsmay be at least 140 bp, 140-280 bp, or at least 500 bp. In oneembodiment, a “copy number variant” refers to the sequence of nucleicacid in which copy-number differences are found by comparison of asequence of interest in test sample with an expected level of thesequence of interest. In some embodiments, a reference sample is derivedfrom a set of sequencing data of unmatched samples to generatenormalization information that permits an individual test sample to benormalized such that deviations from expected copy numbers may bedetermined on normalized sequencing data. The normalization data isgenerated using the techniques provided and permits normalization to ahypothetical most representative sample matched to the test sample. Bynormalizing the test sample, noise introduced by sequencing or otherbias is removed. In certain embodiments, the raw sequencing datacoverage from a targeted sequencing run is normalized to reducetechnical and biological noise to improve CNV detection. In oneembodiment, samples of interest (e.g., fixed formalin paraffin embeddedsamples) are sequenced according to a desired sequencing technique, suchas a targeted sequencing technique that uses a sequencing panel ofprobes to target regions of interest. Once the sequencing data iscollected, the sequencing data is normalized to remove noise, and thenormalized data is subsequently analyzed to detect CNVs. In someembodiments, a method of normalizing copy number is provided thatincludes the steps of receiving a sequencing request from a user tosequence one or more regions of interest in a biological sample;acquiring baseline sequencing data from the one or more regions ofinterest from a plurality of baseline biological samples that are notmatched to the biological sample; determining copy number normalizationinformation using the baseline sequencing data, wherein the copy numbernormalization information comprises at least one copy number baselinefor a region of interest of the one or more regions of interest; andproviding the copy number normalization information to the user. Inanother embodiment, a method of detecting copy number variation isprovided that includes the steps of acquiring sequencing data from abiological sample, wherein the sequencing data comprises a plurality ofraw sequencing read counts for a respective plurality of regions ofinterest; and normalizing the sequencing data to remove region-dependentcoverage. The normalizing comprises: for each region of interest,comparing a raw sequencing read count of one or bins in a region ofinterest of the biological sample to a baseline median sequencing readcount to generate a baseline-corrected sequencing read count for the oneor more bins in the region of interest, wherein the baseline mediansequencing read count for one or more bins in the region of interest isderived from a plurality of baseline samples that are not matched to thebiological sample and is determined from only the most representativeportions of the baseline sequencing data for each region of interest;and removing GC bias from the baseline-corrected sequencing read countto generate a normalized sequencing read count for each region ofinterest. The method also includes determining copy number variation ineach region of interest based on the normalized sequencing read count ofthe one or more bins in each region of interest. In another embodiment,a method of assessing a targeted sequencing panel is provided thatincludes the steps of identifying a first plurality of targets in agenome for a targeted sequencing panel, wherein the first plurality oftargets corresponds to portions of a respective plurality of genes;determining a GC content of each of the first plurality of targets;eliminating targets of the first plurality of targets with GC contentoutside of a predetermined range to yield a second plurality of targetssmaller than the first plurality of targets; when, after theeliminating, the an individual gene has fewer than a predeterminednumber of targets corresponding portions to the individual gene,identifying additional targets in the individual gene; adding theadditional targets to the second plurality to yield a third plurality oftargets; and providing a sequencing panel comprising probes specific forthe third plurality of targets.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/197027, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods for enriching or amplifyingpolynucleotides, and more specifically to methods for enriching oramplifying a target DNA sequence using endonuclease systems, e.g.,CRISPR-Cas systems or Argonaute systems, and applications thereof.Additionally or alternatively, detection of a genetic biomarker caninclude methods for amplifying a target double-stranded nucleic acidusing CRISPR-Cas systems. Additionally or alternatively, detection of agenetic biomarker can include a method for amplifying a targetdouble-stranded nucleic acid including: (a) providing a system having: aclustered regularly interspaced short palindromic repeats (CRISPR) RNA(crRNA) or a derivative thereof, and a CRISPR-associated (Cas) proteinor a variant thereof, wherein the crRNA or the derivative thereofcontains a target-specific nucleotide region complementary to a regionof a first strand of the target double-stranded nucleic acid; (b)contacting the target double-stranded nucleic acid with the system toform a complex; (c) hybridizing a primer to a second strand of thetarget double-stranded nucleic acid, the primer containing a sequencecomplementary to a region of the second strand of the targetdouble-stranded nucleic acid, and (d) extending a nucleic acidcomplementary to the second strand of the target double-stranded nucleicacid from the primer using a polymerase. Additionally or alternatively,detection of a genetic biomarker can include a method for amplifying atarget double-stranded nucleic acid comprising: (a) providing a firstsystem having: a first clustered regularly interspaced short palindromicrepeats (CRISPR) RNA (crRNA) or a derivative thereof, and a firstCRISPR-associated (Cas) protein or a variant thereof, wherein the firstcrRNA or the derivative thereof contains a target-specific nucleotideregion complementary to a region of a first strand of the targetdouble-stranded nucleic acid; (b) providing second system having: asecond clustered regularly interspaced short palindromic repeats(CRISPR) RNA (crRNA) or a derivative thereof, and a secondCRISPR-associated (Cas) protein or a variant thereof, wherein the secondcrRNA or the derivative thereof contains a target-specific nucleotideregion complementary to a region of a second strand of the targetdouble-stranded nucleic acid; (c) contacting the target double-strandednucleic acid with the first system and the second system; (d)hybridizing a first primer to a second strand of the targetdouble-stranded nucleic acid, the first primer containing a sequencecomplementary to a region of the second strand of the targetdouble-stranded nucleic acid, and hybridizing a second primer to a firststrand of the target double-stranded nucleic acid, the second primercontaining a sequence complementary to a region of the first strand ofthe target double-stranded nucleic acid, and (e) extending the 3′ end ofthe first primer and the second primer with one or more polymerases togenerate a first and a second double stranded target nucleic acid. Insome embodiments, the method further includes repeating step (a) andstep (e) for one or more times, e.g., until a desired degree ofamplification is reached. Additionally or alternatively, detection of agenetic biomarker can include a method for amplifying a targetdouble-stranded nucleic acid including: (a) providing a system having: a5′ phosphorylated single-stranded nucleic acid or a derivative thereof,and an Argonaute protein or a variant thereof, wherein the 5′phosphorylated single-stranded nucleic acid or the derivative thereofcontains a target-specific nucleotide region complementary to a regionof a first strand of the target double-stranded nucleic acid; (b)contacting the target double-stranded nucleic acid with the system toform a complex; (c) hybridizing a primer to a second strand of thetarget double-stranded nucleic acid, the primer containing a sequencecomplementary to a region of the second strand of the targetdouble-stranded nucleic acid, and (d) extending a nucleic acidcomplementary to the second strand of the target double-stranded nucleicacid from the primer using a polymerase. Additionally or alternatively,detection of a genetic biomarker can include a method for enriching atarget nucleic acid including: obtaining a population of cell free DNA(cfDNA) from a subject's plasma or serum, the population of cell freeDNA containing the target nucleic acid; providing a system having: a 5′phosphorylated single-stranded nucleic acid or a derivative thereof, andan Argonaute protein or a variant thereof, wherein the 5′ phosphorylatedsingle-stranded nucleic acid or the derivative thereof contains atarget-specific nucleotide region complementary to a region of thetarget nucleic acid; contacting the target nucleic acid with theendonuclease system to form a complex, and separating the complex andthereby enriching for the target nucleic acid. Additionally oralternatively, detection of a genetic biomarker can include a method fordetecting single nucleotide variant (SNV) including: obtaining apopulation of cell free DNA from a subject's plasma or serum; providinga first system having: a first 5′ phosphorylated single-stranded nucleicacid or a derivative thereof, and a first Argonaute protein or a variantthereof, wherein the first 5′ phosphorylated single-stranded nucleicacid or the derivative thereof contains a first target-specificnucleotide region complementary to a region of a first target nucleicacid, and wherein the first Argonaute protein has nuclease activity;cleaving the first target nucleic acid using the first endonucleasesystem, and amplifying a second target nucleic acid using PolymeraseChain Reaction (PCR), wherein the second target nucleic acid contains asingle nucleotide variant version of the first target nucleic acid.Additionally or alternatively, detection of a genetic biomarker caninclude a method for labeling a target nucleic including providing afirst system having: a 5′ phosphorylated single-stranded nucleic acid ora derivative thereof, and a first Argonaute protein or a variantthereof, wherein the first 5′ phosphorylated single-stranded nucleicacid or the derivative thereof contains a first target-specificnucleotide region complementary to a first region of the target nucleicacid, and wherein the first Argonaute protein is capable of generating asingle-stranded nick; contacting a double-stranded nucleic acidcontaining the target nucleic acid with the first nuclease system togenerate a first single-stranded nick at the first region of the targetnucleic acid, and labeling the target nucleic acid. In some embodiments,the method further includes separating the target nucleic acid throughthe labeling and thereby enriching the target nucleic acid. In someembodiments, the method further includes amplifying the target nucleicacid. In some embodiments, the method further includes providing asecond system having: a second 5′ phosphorylated single-stranded nucleicacid or a derivative thereof, and a second Argonaute protein or avariant thereof, wherein the second 5′ phosphorylated single-strandednucleic acid or the derivative thereof contains a second target-specificnucleotide region complementary to a second region of the target nucleicacid, and wherein the second Argonaute protein is capable of generatinga single-stranded nick, and contacting the double-stranded nucleic acidcontaining the target nucleic acid with the second nuclease system togenerate a second single-stranded nick at the second region of thetarget nucleic acid, wherein the first region of the target nucleic acidis different from the second region of the target nucleic acid.Additionally or alternatively, detection of a genetic biomarker caninclude a method for enriching a target nucleic acid including:providing a population of Argonaute proteins programmed with a set of 5′phosphorylated single-stranded nucleic acids, wherein the set of 5′phosphorylated single-stranded nucleic acids contains 5′ phosphorylatedsingle-stranded nucleic acids complementary to a series of differentregions of the target nucleic acid; contacting the target nucleic acidwith the population of Argonaute proteins programmed with the set of 5′phosphorylated single-stranded nucleic acids to generate a series ofnucleic acid fragments, and ligating adaptors to at least one of nucleicacid fragments, wherein the Argonaute proteins are capable of generatingdouble-stranded DNA breaks. Additionally or alternatively, detection ofa genetic biomarker can include a method for sequencing a target nucleicacid including: providing a population of Argonaute proteins programmedwith a set of 5′ phosphorylated single-stranded nucleic acids, whereinthe set of 5′ phosphorylated single-stranded nucleic acids contains 5′phosphorylated single-stranded nucleic acids complementary to a seriesof different regions across the target nucleic acid; contacting thetarget nucleic acid with the population of Argonaute proteins programmedwith the set of 5′ phosphorylated single-stranded nucleic acids togenerate a series of nucleic acid fragments, and sequencing the seriesof nucleic acid fragments. Additionally or alternatively, detection of agenetic biomarker can include a method for sequencing a target nucleicacids including: providing a plurality of populations of Argonauteproteins, each population of Argonaute proteins being programmed with adifferent set of 5′ phosphorylated single-stranded nucleic acids,wherein each set of 5′ phosphorylated single-stranded nucleic acidscontains 5′ phosphorylated single-stranded nucleic acids complementaryto a different series of regions across the target nucleic acid,contacting the target nucleic acid with each of the plurality ofpopulations of Argonaute proteins in a separate reaction to generate adifferent series of nucleic acid fragments, and sequencing the nucleicacid fragments.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2015/198074, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods, apparatus, systems and computer programproducts for presenting sequence information. In some embodiments, thisincludes obtaining a first sequence and a second sequence, determining asimilarity between the first sequence and the second sequence, whereinthe similarity is based upon distance between the first sequence and thesecond sequence, and displaying a block at an intersection point on amatrix plot based on the similarity between the first sequence and thesecond sequence.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2002/099982, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a composition that includes a substrate with asurface comprising discrete sites, a reflective coating on the surface,and a population of microspheres distributed on the substrate. Themicrospheres comprise at least a first and a second subpopulation.Generally, at least one subpopulation comprises a bioactive agent.Additionally or alternatively, detection of a genetic biomarker caninclude a composition wherein the substrate comprises a first and asecond surface, wherein the first surface comprises the discrete sites,and the reflective coating is on the second surface. The population ofmicrospheres are distributed on the first surface. Additionally oralternatively, detection of a genetic biomarker can include a method ofmaking a reflective array. The method includes providing a substratewith a surface comprising discrete sites, applying to the surface acoating of reflective material and distributing microspheres on thesurface. Additionally or alternatively, detection of a genetic biomarkercan include a method of detecting a non-labeled target analyte in asample comprising providing a substrate with a plurality of discretesites, distributing on the sites a population of microspheres comprisinga bioactive agent and a signal transducer element, contacting thesubstrate with the sample, whereby upon binding of the target analyte tothe bioactive agent, a signal from the signal transducer element isaltered as an indication of the presence of the target analyte.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2002/016649, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a method of detecting a target nucleic acid. Themethod comprises contacting the target nucleic acid with an adaptersequence such that the target nucleic acid is joined to the adaptersequence to form a modified target nucleic acid. In addition, the methodcomprises contacting the modified target nucleic acid with an arraycomprising a substrate with a surface comprising discrete sites and apopulation of microspheres comprising at least a first subpopulationcomprising a first capture probe, such that the first capture probe andthe modified target nucleic acid form a complex, wherein themicrospheres are distributed on the surface, and detecting the presenceof the target nucleic acid. In addition, the method comprises adding atleast one decoding binding ligand to the array such that the identity ofthe target nucleic acid is determined.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,914,973, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods, compositions, and kits for detecting gene dysregulations suchas those arising from gene fusions and chromosomal translocations. Themethods, compositions and kits are useful for detecting mutations thatcause the differential expression of a 5′ region of a target generelative to the 3′ region of the target gene. Additionally oralternatively, detection of a genetic biomarker can include a method fordetecting the presence or absence of a dysregulation in a target gene ina test sample. In one embodiment, the method includes: (a) amplifyingportions of a 5′ region of a transcript of the target gene or a cDNAderived therefrom, if present in a test sample, with two or moredifferent 5′ target primer pairs that are directed to the portions ofthe 5′ region of the target gene; (b) amplifying portions of a 3′ regionof a transcript of the target gene or a cDNA derived therefrom, ifpresent in the test sample, with two or more different 3′ target primerpairs that are directed to the portions of 3′ region of the target gene;(c) detecting the amplification products produced by the two or more 5′target primer pairs and the two or more 3′ target primer pairs; (d)determining the average cycle threshold (Ct) among the two or more 5′target primer pairs and the average Ct among the two or more 3′ targetprimer pairs, (e) calculating an IDE Score as the difference between theaverage cycle threshold among the 5′ target primer pairs and the averagecycle threshold among the 3′ target primer pairs, and (f) identifyingthe test sample as (i) having a target gene dysregulation if the IDEScore is significantly different than a cutoff value and the differenceindicates the presence of a target gene dysregulation, or (ii) nothaving a target gene dysregulation if the IDE Score in the test sampledoes not differ significantly from the cutoff value. Additionally oralternatively, detection of a genetic biomarker can include a method fordiagnosing the presence or absence of cancer or a susceptibility tocancer in a subject. In one embodiment, the method includes: (a)obtaining a test sample that comprises nucleic acid from the subject;(b) amplifying portions of a 5′ region of a transcript of a target geneor a cDNA derived therefrom, if present in the test sample, with two ormore different 5′ target primer pairs that are directed to the portionsof the 5′ region of the target gene; (c) amplifying portions of a 3′region of a transcript of the target gene or a cDNA derived therefrom,if present in the test sample, with two or more different 3′ targetprimer pairs that are directed to the portions of the 3′ region of thetarget gene; (d) detecting the amplification products produced by thetwo or more 5′ target primer pairs and the two or more 3′ target primerpairs; (e) determining the average cycle threshold (Ct) among the two ormore 5′ target primer pairs and the average Ct among the two or more 3′target primer pairs; (f) calculating an IDE Score as the differencebetween the average cycle threshold among the 5′ target primer pairs andthe average cycle threshold among the 3′ target primer pairs, and (g)diagnosing the subject as (i) having cancer or a susceptibility tocancer when the IDE Score is significantly different than a cutoff valueand the difference indicates the presence of cancer or a susceptibilityto cancer, or (ii) not having cancer or a susceptibility to cancerresulting from dysregulation of the target gene if the IDE Score in thetest sample does not differ significantly from the cutoff value. In someembodiments, the expression level of the 5′ region of a target gene isdetermined by amplification using two, three, four, five or sixdifferent primer pairs directed to various portions of the 5′ region ofthe target gene. Similarly, two, three, four, five or six differentprimer pairs directed to various portions of the 3′ region of the targetgene may be used to determine the expression level of the 3′ region ofthe target gene. The amounts of amplification products each may benormalized to the amount of an endogenous control gene transcript(“Control”) such as, for example, ABL. In some embodiments, theexpression level or relative amount of transcript can be determinedusing real-time PCR and comparing the threshold cycle (Ct) for eachamplicon. The average Ct values for each of the 3′ (avgCt3′) and 5′(avgCt5′) regions of a target gene are used to calculate an IDE Score,which may be calculated as IDE=(avgCt5′−avgCt3′), orIDE=(avgCt5′)/(Ctcontrol)−(avgCt3′)/(Ctcontrol), orIDE=[Ln((avgCt5′)/Ctcontrol)]−[Ln((avgCt3′)/Ctcontrol)]. In someembodiments, the Ct values are normalized to a reference sample.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,783,854, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods and compositions for detecting target nucleic acids at very lowlevels and in the presence of large amounts of non-target nucleic acids.Generally, a target and non-target nucleic acid are distinguished by thepresence or absence of a fragmentation site, such as a restrictionenzyme recognition site. By differentiating the target and non-target bya fragmentation site, the methods and compositions can be used withvarious nucleic acid detection methods known in the art, such as PCR.Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting the presence or absence of a targetnucleic acid by testing a sample that potentially contains the targetnucleic acid in the presence of non-target nucleic acid, the methodincludes: a) fragmenting the sample nucleic acid under conditions suchthat a subsequent amplification directed to the target nucleic acidresults in an increased detection of the target nucleic acid over thenon-target nucleic acid as compared to amplification withoutfragmentation; b) amplifying the target nucleic acid with a pair ofprimers, where a first primer is specific for the target nucleic acid;and c) detecting the presence or absence of an amplification product,which indicates the presence or absence of the target nucleic acid inthe sample. Additionally or alternatively, detection of a geneticbiomarker can include a method for diagnosing a cancer or detecting thepresence of a tumor cell by determining if an individual has a mutantsequence associated with the cancer or tumor cell type, the methodincludes: a) obtaining a sample including nucleic acid from theindividual; b) fragmenting the sample nucleic acid under conditions suchthat a subsequent amplification directed to the target nucleic acidresults in an increased detection of the target nucleic acid over thenon-target nucleic acid as compared to amplification withoutfragmentation; c) amplifying the target nucleic acid with a pair ofprimers, where a first primer is specific for the target nucleic acid;and d) detecting the presence or absence of an amplification productcontaining the mutant sequence, where diagnosis of cancer is determinedby the presence absence or amount of amplification product containingthe mutant sequence. Additionally or alternatively, detection of agenetic biomarker can include a method for determining prognosis withcancer by determining if an individual has a mutant sequence associatedwith the cancer, the method includes: a) obtaining a sample containingnucleic acid from the individual; b) fragmenting the mutant nucleic acidunder conditions such that a subsequent amplification directed to themutant nucleic acid results in an increased detection of the mutantnucleic acid over the non-mutant nucleic acid as compared toamplification without fragmentation; c) amplifying the mutant nucleicacid with a pair of primers, where a first primer is specific for themutant nucleic acid; and d) detecting the presence, absence and/oramount of an amplification product containing the mutant sequence, wherethe likelihood of an outcome in the individual is associated with thepresence and or amount of mutant nucleic acid sequence. Additionally oralternatively, detection of a genetic biomarker can include a method fordetermining drug sensitivity of an individual diagnosed with cancer, themethod includes: a) obtaining a sample comprising nucleic acid from theindividual; b) fragmenting the mutant nucleic acid under conditions suchthat a subsequent amplification directed to the mutant nucleic acidresults in an increased detection of the mutant nucleic acid over thenon-mutant nucleic acid as compared to amplification withoutfragmentation; c) amplifying the mutant nucleic acid with a pair ofprimers, where a first primer is specific for the mutant nucleic acid;d) detecting the presence, absence and/or amount of an amplificationproduct containing the mutant sequence; and e) relating the presence,absence and/or amount of an amplification product containing the mutantsequence to cancer drug sensitivity. In some embodiments, the mutatednucleic acid sequence is due to a deletion, insertion, substitutionand/or translocation or combinations thereof. In preferred embodiments,fragmentation of nucleic acid sequence in which cleavage of wild-typesequence is with a restriction enzyme, Such pre-amplification digestiontreatment allows for fragmentation to destroy or substantially decreasethe number of wild-type sequences that might be amplified. In yet morepreferred embodiments, the fragmentation using a restriction enzyme iscombined with the use of a mutation specific primer (or mutated sequenceprimer). In preferred embodiments, a mutated sequence destroys ordisrupts a restriction enzyme recognition site present in thecorresponding wild-type sequence and that a mutation specific primer canbe designed to bind to the mutated version of the sequence and not itswild-type counterpart. For example, a mutation specific primer canoverlap a border region, which is a region that contains portions ofboth a wild-type sequence adjacent to a portion of the mutated sequence.In one approach, a sample is assayed for the presence or absence of amutated sequence by amplification and detection of the resultingamplification products. In a preferred embodiment, amplification oftarget nucleic acids is accomplished by polymerase chain reaction (PCR).Single or multiple mutant sequences can be assayed. Amplification ofmultiple mutant sequences can be performed simultaneously in a singlereaction vessel, e.g., multiplex PCR. In this case, probes may bedistinguishably labeled and/or amplicons may be distinguishable by sizedifferentiation. Alternatively, the assay could be performed in parallelin separate reaction vessels. In such later case, the probes could havethe same label. In some embodiments, the methods further comprise anucleic acid extraction step. In some embodiments, at least one primerof each primer pair in the amplification reaction is labeled with adetectable moiety. Thus, following amplification, the various targetsegments can be identified by size and color. The detectable moiety ispreferably a fluorescent dye. In some embodiments, different pairs ofprimers in a multiplex PCR may be labeled with different distinguishabledetectable moieties. Thus, for example, HEX and FAM fluorescent dyes maybe present on different primers in multiplex PCR and associated with theresulting amplicons. In other embodiments, the forward primer is belabeled with one detectable moiety, while the reverse primer is labeledwith a different detectable moiety, e.g. FAM dye for a forward primerand HEX dye for a reverse primer. Use of different detectable moietiesis useful for discriminating between amplified products which are of thesame length or are very similar in length. Thus, in certain embodiments,at least two different fluorescent dyes are used to label differentprimers used in a single amplification. In still another embodiment,control primers can be labeled with one moiety, while the patient (ortest sample) primers can be labeled with a different moiety, to allowfor mixing of both samples (post PCR) and the simultaneous detection andcomparison of signals of normal and test sample. In a modification ofthis embodiment, the primers used for control samples and patientsamples can be switched to allow for further confirmation of results.

Analysis of amplified products from amplification reactions, such asmultiplex PCR, can be performed using an automated DNA analyzer such asan automated DNA sequencer (e.g., ABI PRISM 3100 Genetic Analyzer) whichcan evaluate the amplified products based on size (determined byelectrophoretic mobility) and/or respective fluorescent label.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,546,404, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods, compositions, and kits directed to the detection of genedysregulations such as those arising from gene fusions and chromosomalabnormalities, e.g., translocations, insertions, inversions anddeletions. In some embodiments, the methods, compositions and kits areuseful for detecting mutations that cause the differential expression ofa 5′ region of a target gene relative to the 3′ region of the targetgene. Additionally or alternatively, detection of a genetic biomarkercan include a method for detecting a dysregulation in a target gene. Themethod may include: (a) amplifying a 5′ region of the target genetranscript, if present, in a biological sample with one or more 5′target primer pairs which are complementary to the 5′ region of thetarget gene; (b) amplifying a 3′ region of the target gene transcript,if present, in the biological sample with one or more 3′ target primerpairs which are complementary to the 3′ region of the target gene; and(c) detecting the amounts of amplification product produced by the oneor more 5′ target primer pairs and the one or more 3′ target primerpairs. The method may also provide that a difference in the amounts ofamplification products produced by steps (a) and (b) indicates that thetarget gene is dysregulated. Additionally or alternatively, detection ofa genetic biomarker can include a method for detecting the presence orabsence of a dysregulation in a target gene in a sample. The method mayinclude: (a) measuring the amount of transcription of a 5′ region of thetarget gene and a 3′ region of the target gene in the test sample; and(b) comparing the relative expression of the 5′ region to the 3′ regionof the target gene in the test sample to the relative expression of the5′ region to the 3′ region of the target gene in a reference sample. Themethod may also provide that a difference in the relative expression inthe test sample compared to the reference sample is indicative of thepresence of a gene dysregulation. In an embodiment, the relative amountof transcript can be determined using real-time PCR and comparing thethreshold cycle, or Ct, value, for each amplicon. The Ct value can benormalized to a reference sample. Additionally or alternatively,detection of a genetic biomarker can include a method for diagnosingcancer or a susceptibility to cancer in a subject. The method mayinclude: (a) amplifying a 5′ region of the target gene transcript, ifpresent, in a biological sample with one or more 5′ target primer pairswhich are complementary to the 5′ region of the target gene; (b)amplifying a 3′ region of the target gene transcript, if present, in thebiological sample with one or more 3′ target primer pairs which arecomplementary to the 3′ region of the target gene; and (c) detecting theamounts of amplification product produced by the one or more 5′ targetprimer pairs and the one or more 3′ target primer pairs. The method mayalso provide that a difference in the amounts of amplification productsproduced by steps (a) and (b) indicates that the subject has cancer oris susceptible to cancer resulting from a gene dysregulation.Additionally or alternatively, detection of a genetic biomarker caninclude a method for diagnosing prostate cancer or a susceptibility toprostate cancer in a subject. The method may include: (a) amplifying a5′ region of the target gene transcript, if present, in a biologicalsample with one or more 5′ target primer pairs which are complementaryto the 5′ region of the target gene; (b) amplifying a 3′ region of thetarget gene transcript, if present, in the biological sample with one ormore 3′ target primer pairs which are complementary to the 3′ region ofthe target gene; and (c) detecting the amounts of amplification productproduced by the one or more 5′ target primer pairs and the one or more3′ target primer pairs. The method may also provide that a difference inthe amounts of amplification products produced by steps (a) and (b)indicates that the target gene is dysregulated. Optionally, the nucleicacid sample containing the target gene of interest may be subjected toanother analysis to determine the nature of the gene dysregulation.Suitable analyses include, for example, comparative hybridization (e.g.,comparative genomic hybridization). Comparative hybridization techniquessuch as comparative genomic hybridization (CGH) is limited by the factthat this technique is only able to detect unbalanced rearrangements(rearrangements that lead to gain or loss of genetic material).Comparative hybridization cannot adequately detect chromosomalabnormalities such as balanced translocations. Thus, any of the methodsmay be used in combination with a comparative hybridization technique.The combination of the methods with comparative hybridization (e.g.,CGH) will be able to detect both balanced and unbalanced rearrangementsand provide a more accurate diagnosis than if the comparativehybridization technique was used alone. In the case of unbalancedrearrangements, the comparative hybridization technique may be used as aconfirmatory assay. In some embodiments, target gene dysregulations mayarise from gene fusions and chromosomal abnormalities including, forexample, translocations, deletions, inversions, and insertions. In someembodiments, the biological sample is contacted with the one or more 5′target primer pairs and the one or more 3′ target primer in a multiplexamplification reaction. In one embodiment, the detecting is accomplishedusing a labeled oligonucleotide probe complementary to eachamplification product. For example, each oligonucleotide probe mayinclude a different detectable label, such as a donor fluorophore andquencher moiety. In another embodiment, at least one of the primers forthe 5′ region and/or at least one of the primers for the 3′ region isdetectably labeled, preferably with different detectable labels. Inillustrative embodiments, the amplifying is performed using quantitativeRT-PCR, e.g., real-time RT-PCR. In some embodiments, the chromosomalabnormality is selected from the group consisting of: a translocation, adeletion, an inversion, and an insertion. In one embodiment, thebiological sample is a sample from a subject to be tested for achromosomal abnormality. In some embodiments, the methods furtherinclude amplifying a region of an endogenous control gene transcriptpresent in the biological sample with a primer pair complementary to theendogenous control gene and detecting the amplification of the region ofthe endogenous control gene. In some embodiments, the amount ofamplified target gene transcripts (i.e., the 5′ region and the 3′region) may be normalized to the amount of amplified endogenous controlgene transcript. In some embodiments, the method further includes: (a)measuring the amount of transcription of a 5′ region of a second targetgene and a 3′ region of the second target gene in the test sample; and(b) comparing the relative expression of the 5′ region to the 3′ regionof the second target gene in the test sample to the relative expressionof the 5′ region to the 3′ region of the second target gene in areference sample. The method may also provide that a difference in therelative expression of both the target gene and the second target genein the test sample compared to the reference sample is indicative of thepresence of a target gene:second target gene translocation. Suitablebiological samples include, for example, whole blood, isolated bloodcells, plasma, serum, and urine.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,911,942, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of performing comparative hybridization by comparing the amountof test and reference nucleic acids hybridized to a nucleic acid array,the amounts determined by detecting a signal from the hybridized nucleicacids which are labeled with the same detectable label. This method isapplicable to comparative hybridization methods in general and tocomparative genomic hybridization (CGH) in particular. Accordingly,reference to CGH where the test and reference nucleic acid is genomicnucleic acid should be understood to encompass methods where the testand reference nucleic acids are other than genomic nucleic acids. In apreferred embodiment, CGH is performed using two samples of genomicnucleic acids; a test sample containing genomic nucleic acids, and areference or control sample containing genomic nucleic acids with noknown chromosomal or genetic abnormalities. The test sample and thereference sample are co-hybridized to a nucleic acid array that containsa plurality of nucleic acids or nucleic acid segments spotted onto asurface (such as a glass slide) at discrete locations. The array maycontain target nucleic acid markers for certain known genetic mutationsor disease states, or may represent (in aggregate) an entire chromosome,or the full chromosomal complement to obtain a genetic profile similarto karyotyping. In these approaches, the detectable label may beattached to the test and reference nucleic acids before hybridization orafter hybridization. In another approach, the detectable label may beattached to one of the test or reference nucleic acids beforehybridization while the label is attached to the other of the test orreference nucleic acid after hybridization. The detectable label may beattached covalently or non-covalently such as by a ligand-receptorinteraction or by hybridization between complementary nucleotidesequences. In some embodiments, the comparative hybridization can bedone using the same detectable label by another approach that may bereferred to as an “additive” approach. In accordance with this approach,the test sample nucleic acids comprise a first tag; and the referencesample nucleic acids comprise a second tag. Following hybridization, thesurface is contacted with a first complex containing a detectable labeland a first entity, such that the first complex selectively binds withthe first tag. The next step comprises determining the location andamount of the detectable label bound to the array surface (i.e., to“read” the array). Once the array is read to determine the amount ofdetectable label associated with nucleic acid that comprises the firsttag, the surface is then contacted with a second complex containing thesame detectable label as present in the first complex and containing asecond entity, such that the second complex selectively binds with thesecond tag. The array is then read a second time to determine thelocation and amount of the total detectable label representing bothnucleic acids hybridized to the surface. The last step comprises usingthe results of the two reads to determine the amount of the hybridizednucleic acid that is associated with the second tag. In a preferredapproach, the first read is subtracted from the second read to obtainthe signal representing the nucleic acid that is linked to the secondtag. The signal from the two samples thus determined can be used toidentify differences between the test sample genomic nucleic acids andthe reference sample genomic nucleic acids so as to detect anychromosomal or genetic abnormalities associated with the test samplenucleic acid. In some embodiments, the amount of the hybridized nucleicacid that is associated with the second tag can be determined using aduplicate of the hybridized array but which has not been contacted withthe first complex. Thus, the duplicate array is contacted with thesecond complex and not the first complex. The signal from this secondarray directly represents the amount of hybridized nucleic acid with thesecond tag, which can be compared to the amount of signal from the firstarray that was contacted only with the first complex and represents theamount of hybridized nucleic acid associated with the first tag. Becausethe two analyses are independent of each other, each array may beprocessed in any order or simultaneously. In some embodiments, one mayfirst hybridize the array with the test and reference nucleic acidswherein one of the test and reference nucleic acids has already beenlabeled (e.g. by random priming). The array is then read afterhybridization to determine signal corresponding with the particularlabeled nucleic acid sample. The array is then contacted with a complexcomprising a detectable label and an entity, wherein the complexselectively reacts with the other of the test or reference nucleic acidvia a tag attached to said other of the test or reference nucleic acid.The assay is read again to measure the total signal for both hybridizednucleic acids. The next step comprises using the results of the tworeads to determine the amount of the hybridized nucleic acid that isassociated with the tag. In a preferred approach, the first read issubtracted from the second read to obtain the signal representing thenucleic acid that was linked to the tag. The signal from the two samplesthus determined can be used to identify differences between the testsample genomic nucleic acids and the reference sample genomic nucleicacids so as to detect any chromosomal or genetic abnormalitiesassociated with the test sample nucleic acid. In some embodiments, thehybridized nucleic acid that is associated with the second tag can bedetermined using a duplicate of the hybridized array except that theduplicate is prepared by hybridizing to test and reference nucleic acidsthat do not contain a detectable label. In this case, the duplicatearray is contacted with a complex comprising a detectable label and anentity, wherein the complex selectively reacts with the other of thetest or reference nucleic acid via a tag attached to said other of thetest or reference nucleic acid. The signal from this second arraydirectly represents the amount of hybridized nucleic acid with thesecond tag, which can be compared to the amount of signal from the firstarray that was contacted only with the first complex and represents theamount of hybridized nucleic acid associated with the first tag. Becausethe two analyses are independent of each other, each array may beprocessed in any order or simultaneously. Additionally or alternatively,detection of a genetic biomarker can include a method of comparing theexpression of genes in a test sample versus that of reference sample.The first step of the method includes contacting under hybridizationconditions cDNA prepared from mRNA of a test sample and cDNA preparedfrom mRNA of a reference sample to a surface containing a plurality ofnucleic acid segments each immobilized at discrete locations on thesurface. In this case, the test sample cDNA and the reference samplecDNA are labeled before or after hybridization with the same detectablelabel which is linked to the cDNA of the test sample via a firstlinkage, and to the cDNA of the reference sample via a second linkage.Either the first linkage or the second linkage is susceptible toselective removal and the detectable label linked to nucleic acidshybridized to the surface determined. The location and amount ofdetectable label linked to nucleic acids hybridized the surface of thesupport is determined. The label is then selectively removed from eitherthe hybridized test sample cDNA or the hybridized reference sample cDNA.The location and amount of the detectable label remaining on the supportis then determined and represents one of the samples. The differencebetween the location and amount remaining after removal compared and thelocation and amount prior to removal represents the other of thesamples. The relative amount of each sample nucleic acid hybridized tothe array reflects the expression of genes in the test sample comparedto the reference sample.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,871,687, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method for determining a sequence of contiguous bases within apolynucleotide, the method relying on single-base primer extension usinglabeled dideoxynucleotide terminators. The primers are immobilized tosolid supports (e.g. microspheres or two-dimensional arrays), allowingfor the identification of the labeled terminator incorporated into eachprimer. Data on the incorporated terminators is used to determine thebase identity of a contiguous sequence of nucleotides in a targetnucleic acid. Additionally or alternatively, detection of a geneticbiomarker can include a method for determining a contiguous sequencecomprising at least four bases of a target nucleic acid comprising: (a)preparing one or more reaction mixtures containing the target nucleicacid and four or more primers complementary to a portion of the targetnucleic acid such that the primers each have a 3′ end located 5′ to eachnucleotide position of the sequence to be determined, wherein eachreaction contains from one to all of said primers in any combination andis under conditions where the primers anneal to the target nucleic acid;(b) extending the one or more primers from step (a) with a polymerase inthe presence of one or more labeled dideoxynucleotides; (c) immobilizingsaid primers to a solid support; and (d) detecting the label of thedideoxynucleotide incorporated into each primer and utilizing thisinformation to determine said contiguous sequence of at least four basesof the target nucleic acid. The primers may extended beforeimmobilization to the solid support, i.e. step (b) occurs before step(c) or extended after immobilization on the solid support, i.e. step (c)occurs before step (a). In one embodiment, at least twodifferently-labeled dideoxynucleotides are provided in the same reactionmixture. In another embodiment, four differently-labeleddideoxynucleotides are provided in the same reaction mixture.Additionally or alternatively, detection of a genetic biomarker caninclude methods for determining a contiguous sequence of four or morebases of a target nucleic acid by performing singleplex single-baseprimer extension reactions or multiplex single-base primer extensionreactions. In one embodiment, the four or more primers corresponding tothe entire portion of the target nucleic acid to be sequenced arecombined in a single reaction mixture. In another embodiment, two ormore primers are combined in one reaction mixture, and two or moreprimers are combined in an additional reaction mixture or mixtures.Alternatively, the four or more primers are each added to a separatereaction mixture. In one embodiment, the primers comprise a tag sequenceand are immobilized to the solid support via hybridization to acomplementary capture oligonucleotide conjugated to the solid support.In another embodiment, the primers are immobilized to the solid supportvia a covalent attachment. In one embodiment, the solid support is alabeled microsphere. For example, the microspheres may be made ofpolystyrene. In one embodiment, the label of each microsphere isoptically-detected, based upon varying concentrations of at least twodyes. In certain embodiments, the labeled microspheres and the labeleddideoxynucleotide are detected by flow cytometry. In another embodiment,the solid support is a two-dimensional array and the immobilized primersare positionally defined on the array. The primers may be immobilized tothe array via a covalent attachment or via a linker sequence. In certainembodiments, the extended primers with labeled dideoxynucleotides aredetected by scanning the array.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,492,089, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of performing comparative hybridization by comparing the amountof test and reference nucleic acids hybridized to a nucleic acid array,the amounts determined by detecting a signal from the hybridized nucleicacids which are labeled with the same detectable label. This method isapplicable to comparative hybridization methods in general and to CGH inparticular. Accordingly, reference to CGH where the test and referencenucleic acid is genomic nucleic acid should be understood to encompassmethods where the test and reference nucleic acids are other thangenomic nucleic acids. In a preferred embodiment, CGH is performed usingtwo samples of genomic nucleic acids: a test sample containing genomicnucleic acids, and a reference or control sample containing genomicnucleic acids with no known chromosomal or genetic abnormalities. Thetest sample and the reference sample are co-hybridized to a nucleic acidarray that contains a plurality of nucleic acids or nucleic acidsegments spotted onto a surface (such as a glass slide) at discretelocations. The array may contain target nucleic acid markers for certainknown genetic mutations or disease states, or may represent (inaggregate) an entire chromosome, or the full chromosomal complement toobtain a genetic profile similar to karyotyping. In these approaches,the detectable label may be attached to the test and reference nucleicacids before hybridization or after hybridization. In another approach,the detectable label may be attached to one of the test or referencenucleic acids before hybridization while the label is attached to theother of the test or reference nucleic acid after hybridization. Thedetectable label may be attached covalently or non-covalently such as bya ligand-receptor interaction or by hybridization between complementarynucleotide sequences. In some embodiments, the test and referencesamples are labeled with a detectable label; preferably the test andreference samples are labeled with the same detectable label; preferablythe detectable label is a flourochrome; preferably the detectable labelis dCTP-Cy3. In certain aspects, methods are provided that allow for theuse of a single label to determine the relative amount of test andreference nucleic acids hybridized to the array. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetermining differences between nucleic acid in a test sample and areference sample, wherein the method involves amplifying nucleic acidsequence from the test sample nucleic acid and amplifying nucleic acidsequence from the reference sample nucleic acid, where one of theamplification reactions is conducted using dUTP and not dTTP and theother is conducted using dTTP and not dUTP; hybridizing to a nucleicacid array a solution comprising the amplified test sample and amplifiedreference sample; and determining the relative amount of hybridized testand reference nucleic acids bound to the array. In certain embodiments,determining the relative amount of hybridized test and reference nucleicacids includes a) determining a signal for the detectable labelhybridized to the array representing the total of hybridized test andreference nucleic acid; b) treating the hybridized nucleic acids with anenzyme that selectively degrades DNA having uracil residues; and c)determining a signal for the detectable label hybridized to the arrayfollowing step b), which signal represents one of the hybridized test orreference nucleic acid. In particularly preferred embodiments, theenzyme that selectively degrades DNA having uracil residues isuracil-DNA N-glycosylase (UNG). Additionally or alternatively, detectionof a genetic biomarker can include a method of determining differencesbetween nucleic acid in a test sample and a reference sample isprovided, where the method involves: (a) contacting under hybridizationconditions a test sample containing nucleic acids and a reference samplecontaining nucleic acids to a surface containing a plurality of nucleicacid segments each immobilized at discrete locations on the surface,where the test sample and the reference sample are labeled before orafter hybridization with the same detectable label; (b) determining thelocation and amount of the detectable label linked to nucleic acidshybridized to the surface; (c) selectively removing either thehybridized test sample nucleic acids or the hybridized reference samplenucleic acids; (d) determining the location and amount of the detectablelabel linked to nucleic acids hybridized to the surface following step(c); and (e) comparing the results of step (b) to the results of step(d) to detect differences in the nucleic acids of the test sample andreference sample. In some preferred embodiments, the step of selectivelyremoving hybridized test nucleic acids or reference nucleic acids isperformed by subjecting the nucleic acids to an enzyme that selectivelydegrades DNA having certain properties; preferably an enzyme thatdegrades DNA having uracil residues; more preferably the enzyme thatselectively degrades DNA having uracil residues is uracil-DNAN-glycosylase (UNG). In some embodiments, the step of selectivelyremoving hybridized test nucleic acids or reference nucleic acids bysubjecting nucleic acids to an enzyme that selectively degrades DNAhaving uracil residues is achieved by (1) amplifying sequence from atest sample and amplifying sequence from a reference sample nucleicacid, where one of the amplification reactions is conducted using dUTPand not dTTP and the other is conducted using dTTP and not dUTP; (2)hybridizing the amplified nucleic acids; and (3) treating the hybridizednucleic acids with an enzyme that selectively degrades DNA having uracilresidues. In some embodiments, the methods may be used to detect anydifferences between nucleic acids in a test sample and a referencesample, including differences in the amount of nucleic acids having aparticular sequence or differences in nucleic acid sequences. Inparticularly preferred embodiments, the methods are used to detectgenetic abnormalities in the test sample. The methods may be applied toCGH using a chromosomal spread or array-based CGH. In some preferredembodiments, the methods provided may be used to compare the expressionof genes in a test sample versus that of a reference sample.Additionally or alternatively, detection of a genetic biomarker caninclude a method of performing comparative hybridization. The methodincludes comparing the amount of test and reference nucleic acidshybridized to a nucleic acid array, wherein the amount of hybridizedtest and reference nucleic acids is determined by detecting a signalfrom the hybridized nucleic acids which are labeled with the samedetectable label. In one embodiment, the amount of hybridized test andreference nucleic acids are determined by: a) determining a signal forthe detectable label hybridized to the array representing the total ofhybridized test and reference nucleic acid; b) treating the hybridizednucleic acids to selectively remove one of the test or reference nucleicacids; c) determining a signal for the detectable label hybridized tothe array following step b), which represents one of the hybridized testor reference nucleic acid; and d) determining a signal for the other ofthe hybridized test or reference by using the signal from c) and b). Incertain preferred embodiments, the step of amplifying sequence from atest sample and amplifying sequence from a reference sample involvesamplifying genomic DNA in the samples is conducted using random primingsuch as is well known in the art. Alternatively, the step of amplifyingsequence from a test sample and amplifying sequence from a referencesample may involve using RNA to generate cDNA and amplifying the cDNAusing random priming and or amplifying specific sequences usingparticular primers. In certain preferred embodiments, the amplificationreaction may be performed using one or more labeled nucleotides as ameans to label the amplified nucleic acids with a detectable label;preferably both test and reference sample nucleic acids are amplifiedwith the same labeled nucleotide; preferably the labeled nucleotide isdCTP-Cy3. Additionally or alternatively, detection of a geneticbiomarker can include a method of comparing the expression of genes in atest sample versus that of a reference sample is provided. The methodincludes comparing the amount of cDNA prepared from mRNA of a testsample and cDNA prepared from mRNA of a reference sample hybridized to anucleic acid array, the amount of hybridized test and reference cDNAdetermined by detecting a signal from the hybridized cDNA which islabeled with the same detectable label. The method involves amplifyingnucleic acid sequence from cDNA prepared from RNA of the test sample andamplifying nucleic acid sequence from cDNA prepared from RNA of thereference sample, where one of the amplification reactions is conductedusing dUTP and not dTTP and the other is conducted using dTTP and notdUTP; hybridizing to the nucleic acid array a solution comprising theamplified test sample and amplified reference sample; and determiningthe relative amount of hybridized test and reference nucleic acids boundto the array. In certain embodiments, determining the relative amount ofhybridized test and reference nucleic acids includes a) determining asignal for the detectable label hybridized to the array representing thetotal of hybridized test and reference nucleic acid; b) treating thehybridized nucleic acids with an enzyme that selectively degrades DNAhaving uracil residues; and c) determining a signal for the detectablelabel hybridized to the array following step b), which signal representsone of the hybridized test or reference nucleic acid.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,093,063, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can methodsfor detecting a genomic nucleic acid of interest in a test samplewithout amplification and without the need for intact cells or nuclei.Generally, a genomic nucleic acid is hybridized to a labeled probe andanchored to a solid support through means other than nucleic acidhybridization. The genomic nucleic acid is detected by detecting thelabel in the hybridized complex on the solid support. The method may beused to detect a genetic abnormality e.g., point mutation, geneduplication or deletion, and chromosomal translocation. The method mayalso be used for diagnosis or prognosis of a disease. Additionally oralternatively, detection of a genetic biomarker can include a method fordetecting a target sequence in genomic nucleic acid, by: a. contacting asample of genomic nucleic acid containing the target sequence with aprobe specific for the target sequence and forming on a solid support acomplex consisting of the genomic nucleic acid and the probe hybridizedto the target sequence, wherein the probe contains a detectable label,the genomic nucleic acid is anchored to the solid support through meansother than nucleic acid hybridization and the target sequence of thegenomic nucleic acid has not been amplified; and b. detecting thepresence of the target sequence in the genomic nucleic acid by detectingassociation of the label with the solid support. Additionally oralternatively, detection of a genetic biomarker can include a method fordetecting the presence or absence of a genetic abnormality in genomicnucleic acid, by: a. contacting a sample of genomic nucleic acid with aprobe specific for the genetic abnormality and forming on a solidsupport a complex consisting of the genomic nucleic acid and the probeif the genetic abnormality is present in the genomic nucleic acid, thegenomic nucleic acid is anchored to the solid support through meansother than nucleic acid hybridization and the target sequence of thegenomic nucleic acid has not been amplified; b. detecting the presenceof the genetic abnormality by detecting association of the label withthe solid support. Additionally or alternatively, detection of a geneticbiomarker can include a method for detecting genetic abnormality in agenomic nucleic acid, by: a. contacting a sample of genomic nucleic acidcontaining the genetic abnormality with a first probe specific for thegenetic abnormality and forming a first complex on a solid supportconsisting of the genomic nucleic acid and the first probe, wherein theprobe contains a detectable label, the genomic nucleic acid is anchoredto the solid support through means other than nucleic acid hybridizationand the target sequence of the genomic nucleic acid has not beenamplified; b. contacting a sample of genomic nucleic acid with a secondprobe specific for the reference nucleic acid and forming a secondcomplex on a solid support consisting of the reference nucleic acid andthe second probe, wherein the second probe contains a detectable label;and c. measuring the amount of the first complex formed by detecting thedetectable label of the first probe associated with the complex andmeasuring the amount of second complex formed by detecting thedetectable label of the second probe associated with the complex; and d.comparing the amount of the first complex to the amount of the secondcomplex, wherein a difference in the amount of two complexes is anindicative of genetic abnormality. In some embodiments, the genomicnucleic acid and reference nucleic acid are from the same sample. Inanother embodiment of any of the foregoing aspects, the genomic nucleicacid and the reference nucleic acid are from a different sample, whichmay be from the same or different individuals. In another embodiment,the amount of first complex and the second complex are determined usingthe same solid support, and the detectable labels of the first probe andthe second probe are different. Additionally or alternatively, detectionof a genetic biomarker can include a method for detecting geneticabnormality in a genomic nucleic acid, by: a. contacting a sample ofgenomic nucleic acid containing the genetic abnormality with a firstprobe specific for the genetic abnormality and forming a first complexon a solid support consisting of the genomic nucleic acid and the firstprobe, wherein the probe contains a detectable label, the genomicnucleic acid is anchored to the solid support through means other thannucleic acid hybridization and the target sequence of the genomicnucleic acid has not been amplified; b. contacting a sample of genomicnucleic acid with a second probe specific for the reference nucleic acidand forming a second complex on a solid support consisting of thereference nucleic acid and the second probe, wherein the second probecontains a detectable label; and c. measuring the amount of the firstcomplex formed by detecting the detectable label of the first probeassociated with the complex and measuring the amount of second complexformed by detecting the detectable label of the second probe associatedwith the complex; and d. obtaining a ratio of the amount of the firstand the second complex; and e. comparing the ratio obtained to a ratiosimilarly obtained using genomic nucleic acid from a reference sample,wherein a difference in the ratios is indicative of genetic abnormality.In preferred embodiments, the genomic nucleic acid and the referencenucleic acid are anchored to the solid support through interaction ofbiotin and avidin. In another preferred embodiment, the solid support isa bead. In another preferred embodiment, the first and second complexesare detected by flow cytometry. Additionally or alternatively, detectionof a genetic biomarker can include a method for diagnosis in anindividual by: a. contacting a sample of genomic nucleic acid from theindividual with a probe complementary to nucleic acid sequence specificfor the disease and forming on a solid support a complex consisting ofthe genomic nucleic acid and the probe if the genomic nucleic acidcontains the nucleic acid sequence specific for the disease, wherein theprobe contains a detectable label, the genomic nucleic acid is anchoredto the solid support through means other than nucleic acid hybridizationand the target sequence of the genomic nucleic acid has not beenamplified; and b. measuring the amount of the complex formed on thesolid support by detecting the amount of detectable label associatedwith the support; and c. comparing the amount of complex formed to theamount of complex formed using genomic nucleic acid from a referencesample assayed under similar conditions, wherein a difference in amountof complex formed from the individual as compared to the referencesample is diagnostic for the disease. In one embodiment, the referencesample may be obtained from an individual assumed to be free of thedisease. In another embodiment, the reference sample may be obtainedfrom an individual known to have the disease. In another embodiment, thereference sample is obtained from the same individual after obtainingthe first sample. In one embodiment, the method may be used formeasuring tumor burden in an individual suspected of having cancer. Inanother embodiment, the method may be used for prognosis of a disease.The genomic nucleic acid may be anchored covalently or non-covalently tothe solid support. In some embodiments, the genomic nucleic acid may beanchored non-covalently to the solid support via a “binding pair,” whichrefers to two molecules which form a complex through a specificinteraction. Thus, the genomic nucleic acid can be captured on the solidsupport through an interaction between one member of the binding pairlinked to the genomic nucleic acid and the other member of the bindingpair coupled to the solid support. In a preferred embodiment, thebinding pair is biotin and avidin, or variants of avidin e.g.streptavidin, and NeutrAvidin™. In other embodiments, the binding pairmay be a ligand-receptor, a hormone-receptor, an antigen-antibody. Insome embodiments, the genomic nucleic acid may be anchored to the solidsupport through covalent linking. In one embodiment, the covalentlinking of the genomic nucleic acid to the solid support is achievedthrough photoactive groups e.g. azido, azidophenacyl, 4-nitrophenyl3-diazopyruvate, psolarens, psolaren derivatives. In another embodiment,the genomic nucleic acid can be cross-linked to variety of solidsurfaces by UV cross linking. In another embodiment, the genomic nucleicacid may be anchored to the solid support though chemical coupling usingchemical linkers. In another preferred embodiment, the genomic nucleicacid is genomic DNA. In another embodiment, the reference nucleic acidis a house keeping gene or a single copy sequence in a chromosome. Insome embodiments. the test sample or the reference sample containinggenomic nucleic acid and reference nucleic acid, respectively, can beobtained from or accessed within cells, tissues, body fluids, plasma,serum, urine, central nervous system fluid, stool, bile duct,paraffin-embedded tissue, cell lysates, tissue lysates and the like. Thetest and reference nucleic acid may be obtained from any number ofsources and by any method.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,076,074, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods for detecting chromosomal abnormalities including balancedtranslocations, wherein the method involves performing array-based CGHin conjunction with probes for detecting the translocations. In oneaspect, the methods involve hybridizing to a genomic nucleic acid arraya test sample of genomic nucleic acid, a reference sample of nucleicacid, and at least one probe for detecting a balanced translocation; anddetermining the relative amount of hybridized test and reference nucleicacids hybridized to the array as well as determining hybridization tothe array of the probe or probes for detecting the translocation. In apreferred embodiment, the methods are performed using two samples ofgenomic nucleic acid; a test sample containing genomic nucleic acid, anda reference or a control sample containing genomic nucleic acid thelatter with no known chromosomal or genetic abnormalities. The test andreference samples are co-hybridized to a nucleic acid array containing aplurality of nucleic acids or nucleic acid segments spotted onto asurface (such as a glass side) at discrete locations. The array maycontain target nucleic acid markers for certain known genetic mutationsor disease states, or may represent (in aggregate) an entire chromosome,or the full chromosomal complement to obtain a genetic profile. In theseapproaches, the detectable label may be attached to the test andreference nucleic acids before or after hybridization and in any order.The detectable label may be attached covalently or non-covalently suchas by a ligand-receptor interaction or by hybridization betweencomplementary nucleotide sequences. In addition, a probe for detectingtranslocations is hybridized to the genomic DNA. In one approach, theprobe is complementary to a moving segment of the genome which istranslocated. The moving segment may be upstream or 5′ of thetranslocation break point or downstream or 3′ of the translocationbreakpoint. If the test sample does not contain the balancedtranslocation, then the probe will hybridize to the array where themoving segment is located in the wildtype. If the test sample doescontain the balanced translocation, then the probe will again hybridizeto the array where the moving segment is located in the wildtype and tothe area of array which contains the nucleic acid which now contains themoving segment. Additionally, multiple probes all complementary to themoving segment being translocated can be used in a single hybridization.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,021,888, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method for performing a rapid hybridization assay. The method can beused to reduce the time to complete nucleic acid hybridization betweennucleic acids in solution and nucleic acid immobilized to a solidsupport. The method applies acoustic surface waves during any ofprehybridization to non-specific nucleic acid, hybridization to targetnucleic acid or during any washing steps following hybridization. In oneapproach, nucleic acid hybridization to immobilized nucleic acidincludes the following steps: a) contacting a solid support comprisingone or more immobilized nucleic acid probe molecules under hybridizationconditions with a non-specific blocking nucleic acid wherein the one ormore immobilized nucleic acid probe molecules are capable of hybridizingwith a sequence complementary thereto; b) contacting the solid supportunder hybridization conditions with a test sample containing nucleicacid target molecules; c) applying acoustic surface waves to thehybridization of step a) or step b) or both; and d) determining whetherthe one or more nucleic acid probes of the solid support have hybridizedto test sample nucleic acid target molecules. In another approach,nucleic acid hybridization to immobilized nucleic acid includes thefollowing steps: a) contacting a solid support containing one or moreimmobilized nucleic acid probe molecules under hybridization conditionswith a nucleic acid test sample containing one or more nucleic acidtarget molecules, the one or more immobilized nucleic acid probemolecules capable of hybridizing with a sequence complementary thereto;b) applying acoustic surface waves to the hybridization of step a); andc) determining whether the one or more nucleic acid probes of the solidsupport has hybridized to test sample nucleic acid target molecules.This method also may include a prehybridization step with non-specificnucleic acid such as described for the method further above. One or morewashing steps may be applied after the prehybridization step or thehybridization step. One or more washing steps may include application ofacoustic surface waves. With the application of acoustic waves, theprehybridization step may be limited to less than about 7 hours, morepreferably less than about 5 hours and even more preferably less thanabout 3 hours. The hybridization step to target nucleic acid may be lessthan about 3 hours, more preferably less than about 2 hours and evenmore preferably less than about 1 hours. The washing steps are performedfor less than about 1 hour, more preferably less than about 30 minutes.In a preferred embodiment, the method is performed in less than about 9hours, more preferably less than about 7 hours and even more preferablyless than about 5 hours. The method of hybridization to a test nucleicacid target molecule may further include one or more reference nucleicacid target molecules. The test and/or reference nucleic acid targetmolecules may be labeled with a detectable agent. In some embodiments,the solid support may contain an array of immobilized nucleic acid probemolecules. In some embodiments, the immobilized nucleic acid probemolecules may include the sequence of a bacterial artificial chromosome.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0142304, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods and compositions for the detectionof mutations that are predictive of the responsiveness of a subjectdiagnosed with breast cancer, colorectal cancer, melanoma, or lungcancer to a particular therapeutic regimen. In some embodiments, themethods allow for rapid and sensitive detection of mutations in thetarget nucleic acid sequences of AKT1, ERBB2, FOXL2, IDH2, NRAS, RET,ALK, ERBB4, GNA11, KIT, PDGFRA, SMO, BRAF, FBXW7, GNAQ, KRAS, PIK3CA,STK11, CTNNB1, FGFR2, GNAS, MAP2K1, PIK3R1, TP53, DDR2, FGFR3, HRAS,MET, PTCH1, EGFR, FGFR4, IDH1, NOTCH1, and PTEN. Additionally oralternatively, detection of a genetic biomarker can include a method fordetecting at least one mutation in a plurality of cancer-related genesin a subject comprising (a) extracting genomic DNA from a formalin fixedparaffin-embedded tumor sample obtained from the subject; (b) generatinga library comprising amplicons corresponding to each of the plurality ofcancer-related genes, said plurality of cancer-related genes comprisingAKT1, ERBB2, FOXL2, IDH2, NRAS, RET, ALK, ERBB4, GNA11, KIT, PDGFRA,SMO, BRAF, FBXW7, GNAQ, KRAS, PIK3CA, STK11, CTNNB1, FGFR2, GNAS,MAP2K1, PIK3R1, TP53, DDR2, FGFR3, HRAS, MET, PTCH1, EGFR, FGFR4, IDH1,NOTCH1, and PTEN, wherein (i) generating said library occurs without theuse of a bait set comprising nucleic acid sequences that arecomplementary to at least one of the plurality of amplicons; and (ii)the quality of the genomic DNA extracted from the formalin fixedparaffin-embedded tumor sample is not assessed using quantitative PCRprior to generating the library; (c) ligating an adapter sequence to theends of the plurality of amplicons; and (d) detecting at least onemutation in at least one of the plurality of amplicons using highthroughput massive parallel sequencing. In some embodiments of themethod, the at least one mutation detected is a mutation in EGFR, KRAS,BRAF, NRAS, ERBB2 or PIK3CA. In one embodiment, the at least onemutation detected is selected from the group consisting of BRAF V600E,BRAF V600K, BRAF K483Q, BRAF G466V, BRAF G464V, BRAF E501V, BRAF E501K,EGFR ΔE746_A750, EGFR R680Q, EGFR G598E, KRAS A146T, KRAS R68M, KRASL19F, KRAS G12V, KRAS G12D, KRAS G12C, KRAS G13D, KRAS G13C, KRAS G12A,KRAS G12S, KRAS Q22K, NRAS Q61K, NRAS Q61R, NRAS G12R, NRAS G12D, PIK3CAC420R, PIK3CA G106R, PIK3CA R38H, PIK3CA E453K, PIK3CA H1044R, PIK3CAN1044K, PIK3CA E545K, PIK3CA Q546H, PIK3CA H1047R, PIK3CA H1043L, PIK3CAM1043V, PIK3CA E542K, PIK3CA E542Q, PIK3CA T1053A, PIK3CA I121V, PIK3CAH1047L, ERBB2 L755S, ERBB2 S310Y, ERBB2 D769Y, ERBB2 S255R, DDR2 H92Y,DDR2 R31L, DDR2 L34P, DDR2 P381R and DDR2 K392N. In some embodiments ofthe method, the library comprising amplicons corresponding to each ofthe plurality of cancer-related genes is generated using no more than 10ng of extracted genomic DNA from the formalin fixed paraffin-embeddedtumor sample. In some embodiments of the method, the library comprisingamplicons corresponding to each of the plurality of cancer-related genesis generated using 11-25 ng of extracted genomic DNA from the formalinfixed paraffin-embedded tumor sample. In certain embodiments, the highthroughput massive parallel sequencing is performed usingpyrosequencing, reversible dye-terminator sequencing, SOLiD sequencing,Ion semiconductor sequencing, Helioscope single molecule sequencing,sequencing by synthesis, sequencing by ligation, or SMRT™ sequencing. Insome embodiments of the method, the adapter sequence is a P5 adapter, P7adapter, P1 adapter, A adapter, or Ion Xpress™ barcode adapter.Additionally or alternatively, in some embodiments, the plurality ofamplicons further comprises a unique index sequence. In someembodiments, the formalin fixed paraffin-embedded tumor sample is aheterogeneous tumor. In certain embodiments, 5% of the cells of theheterogeneous tumor harbor at least one mutation in at least one of theplurality of amplicons.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0051329, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can methods for determining the presence of a variantin one or more genes in a subject comprising: (a) providing rawsequencing data generated from a nucleic acid sequencing reaction on anucleic acid sample from the subject using a nucleic acid sequencer; (b)removing low quality reads from the raw sequencing data that fail aquality filter; (c) trimming adapter and/or molecular identification(MID) sequences from the filtered raw sequencing data; (d) mapping thefiltered raw sequencing data to a genomic reference sequence to generatemapped reads; (e) sorting and indexing the mapped reads; (f) adding readgroups to a data file to generate a processed sequence file; (g)creating realigner targets; (h) performing local realignment of theprocessed sequence file to generate a re-aligned sequence file; (i)removing of duplicate reads from the re-aligned sequence file; (j)analyzing coding regions of interest; and (k) generating a report thatidentifies whether the variant is present based on the analysis in step(j), wherein steps (g) and (j) are performed using a modified genomicalignment utility limited to nucleic acid regions containing the one ormore genes of interest. In some embodiments, the method comprisesperforming the nucleic acid sequencing reaction on the nucleic acidsample from the subject using a nucleic acid sequencer to generate theraw sequencing data of step (a). In some embodiments, analyzing codingregions of interest comprises calling variants at every position in theregions of interest. In some embodiments, the regions of interest arepadded by an additional 150 bases. In some embodiments, variant callingis performed with a modified GATK variant caller. In some embodiments,mapping the reads to a genomic reference sequence is performed with aBurrows Wheeler Aligner (BWA). In some embodiments, mapping the reads toa genomic reference sequence does not comprise soft clipping. In someembodiments, the genomic reference sequence is GRCh37.1 human genomereference. In some embodiments, the sequencing method comprises emulsionPCR (emPCR), rolling circle amplification, or solid-phase amplification.In some embodiments, the solid-phase amplification is clonal bridgeamplification. In some embodiments, the nucleic acid for sequenceanalysis is extracted from a biological sample from a subject. In someembodiments, the biological sample is a fluid or tissue sample. In someembodiments, the biological sample is a blood sample. In someembodiments, the nucleic acid is genomic DNA. In some embodiments, thenucleic acid is cDNA reversed transcribed from mRNA. In someembodiments, wherein the nucleic acid samples is prepared prior tosequencing by performing one or more of the following methods: (a)shearing the nucleic acid; (b) concentrating the nucleic acid sample;(c) size selecting the nucleic acid molecule in a sheared nucleic acidsample; (d) repairing ends of the nucleic acid molecules in the sampleusing a DNA polymerase; (e) attaching one or more adapter sequences; (f)amplifying nucleic acids to increase the proportion of nucleic acidshaving an attached adapter sequence; (g) enriching the nucleic acidsample for one or more genes of interest; and/or (h) quantification ofthe nucleic acid sample primer immediately prior to sequencing. In someembodiments, the one or more adapter sequences comprises nucleic acidsequences for priming the sequencing reaction and/or a nucleic acidamplification reaction. In some embodiments, the one or more adaptersequences comprises a molecular identification (MID) tag. In someembodiments, enriching the nucleic acid sample for one or more genes ofinterest comprises exon capture using one or more biotinylated RNAbaits. In some embodiments, the nucleic acid for sequence analysis isobtained from a subject that is a mammal. In some embodiments, thesubject is a human patient. In some embodiments, the subject is a humansuspected of having cancer or suspected of being at risk of developing acancer. In some embodiments, the methods provided further compriseconfirming the presence of the one or more variants by sequencing.Additionally or alternatively, detection of a genetic biomarker caninclude systems comprising one or more electronic processors configuredto: (a) remove low quality reads from the raw sequencing data that faila quality filter; (b) trim adapter and/or molecular identification (MID)sequences from the filtered raw sequencing data; (c) map the filteredraw sequencing data to a genomic reference sequence to generate mappedreads; (d) sort and index the mapped reads; (e) add read groups to adata file to generate a processed sequence file; (f) create realignertargets; (g) perform local realignment of the processed sequence file togenerate a re-aligned sequence file; (h) remove of duplicate reads fromthe re-aligned sequence file; and (i) analyze coding regions ofinterest. In some embodiments, analyzing coding regions of interestcomprises calling variants at every position in the regions of interest.In some embodiments, the regions of interest are padded by an additional150 bases. In some embodiments, variant calling is performed with amodified GATK variant caller. In some embodiments, mapping the reads toa genomic reference sequence is performed with a Burrows Wheeler Aligner(BWA). In some embodiments, mapping the reads to a genomic referencesequence does not comprise soft clipping. In some embodiments, thegenomic reference sequence is GRCh37.1 human genome reference.Additionally or alternatively, detection of a genetic biomarker caninclude non-transitory computer-readable media having instructionsstored thereon, the instructions comprising: (a) instructions to removelow quality reads that fail a quality filter; (b) instructions to trimadapter and MID sequences from the filtered raw sequencing data; (c)instructions to map the filtered raw sequencing data to a genomicreference sequence to generate mapped reads; (d) instructions to sortand index the mapped reads; (e) instructions to add read groups to adata file to generate a processed sequence file; (f) instructions tocreate realigner targets; (g) instructions to perform local realignmentof the processed sequence file to generate a re-aligned sequence file;(h) instructions to remove duplicate reads from the re-aligned sequencefile; and (i) instructions to analyze coding regions of interest. Insome embodiments, analyzing coding regions of interest comprises callingvariants at every position in the regions of interest. In someembodiments, the regions of interest are padded by an additional 150bases. In some embodiments, variant calling is performed with a modifiedGATK variant caller. In some embodiments, mapping the reads to a genomicreference sequence is performed with a Burrows Wheeler Aligner (BWA). Insome embodiments, mapping the reads to a genomic reference sequence doesnot comprise soft clipping. In some embodiments, the genomic referencesequence is GRCh37.1 human genome reference.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0316149, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include apparatus, systems, and methods forclassifying genetic variants. In some embodiments, a standardized,rules-based process may provide a variant pathogenicity risk score basedon clinical grade information in a CLIA-certified laboratory. Such astandardized system may provide reliable pathogenicity scores for DNAvariants encountered in a clinical laboratory setting. In someembodiments, a sample of DNA may be obtained from a patient, who may ormay not have been diagnosed with a disease or other medical condition.From the sample, the patient's genome may be sequenced in whole or inpart. The result of sequencing may then be compared, e.g., to one ormore reference genomes to identify variants in the patient's genome. Oneor more of the variants may be compared to databases of known variants.The result of that comparison may be identification of one or morepreviously unknown variants, one or more variants that are known butunclassified, or both. In some embodiments, an unclassified variant maybe evaluated against one or more objective criteria. For example, in anembodiment, an embodiment may be assigned a starting score. Applicationof one or more objective criteria may cause additions and subtractionsfrom the score, leading to a final score that may be used to classifythe variant. In some embodiments, classification of one or morepreviously-classified variants may be revisited, e.g., periodically, toreevaluate the variants in light of new information gained since theprevious evaluation. Additionally or alternatively, detection of agenetic biomarker can include a method of assigning a score to a geneticvariant is based on multiple scoring criteria and reflects an estimateof pathogenicity of the variant. The method comprises identifying thevariant in sequenced DNA obtained from a patient and assigning astarting score to the variant, where the starting score is a singlenumeric value that is associated with variants of unknown significance.In some embodiments, the method also comprises: calculating a firstscore adjustment that is based on objective evaluation of minor evidenceand splicing predictions; calculating a second score adjustment that isbased on objective evidence of the frequency with which the variantoccurs in a general population; calculating a third score adjustmentthat is based on objective evidence of the frequency with which thevariant occurs in clinically characterized patients; calculating afourth score adjustment that is based on objective evidence of thefrequency with which the variant has been observed to co-occur with oneor more other variants that are known to be pathogenic; calculating afifth score adjustment that is based on objective evidence of a degreeto which the variant exhibits segregation within one or more families;calculating a sixth score adjustment that is based on objective evidenceof association between the variant and one or more disease phenotypeswithin data describing one or more families; and calculating a seventhscore adjustment based on objective evidence regarding whether thevariant affects functions of one or more proteins that are known to beassociated with disease. In some embodiments, the method also comprisescalculating a variant score based on the starting value, the first scoreadjustment, the second score adjustment, the third score adjustment, thefourth score adjustment, the fifth score adjustment, the sixth scoreadjustment, and the seventh score adjustment, the variant score being asingle numeric value. And the method comprises assigning the variant toan assigned classification based solely on the variant score, where theassigned classification is one of a group that consists of a pluralityof classifications, each classification in the plurality beingassociated with a respective different evaluation of variantpathogenicity.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0009287, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include apparatus, systems, and methods ofdetecting variations in the number of copies of genetic subsequenceswithin the genome of an organism. According to some embodiments, samplesof genetic material, including DNA, may be taken from several patients.Sections of the patients' DNA may then be sequenced, e.g., through aprocess that includes, for each patient, purifying, concentrating, andfragmenting that patient's DNA. Each fragment may receive a molecularlabel that identifies the patient from whom the DNA was received and theDNA may otherwise be modified in preparation for sequencing (e.g.,through one or more steps that may include one or more of: filtration,amplification, and modification such as to attach primers to thefragments). The fragments from several patients may be pooled, and thepool may include, e.g., one or more controls that comprise known geneticmaterial. The fragments in the pool may then be sequenced, and theresults of the sequencing may be stored, e.g., as one or more computerfiles. The results may then be processed, e.g., by one or more computersystems, to identify possible copy number variations in the patientsfrom whom the samples were taken. For example, in some embodiments, thesequences may be demultiplexed to identify the respective patients whoseDNA each sequence represents. Each patient's samples may then be alignedto a reference genome, and coverage for each base pair in each region ofinterest on the patient's genome may then determined. From the base paircoverage, the coverage of one or more subunits of the patient's genomemay then be determined. For example, in some embodiments, the coverageof multiple exons may be determined. In some embodiments, themeasurements of the coverage may then go through one or more steps ofnormalization. For example, the mean coverage of one or more ampliconsknown to be on autosomes may be compared to the mean coverage of one ormore amplicons known to be on the X chromosome to provide a roughestimate of the number of X chromosomes (viz., one or two) in thepatient's karyotype. If the patient is determined to have only one Xchromosome, normalization may include doubling the coverage of allamplicons known to come from the X chromosome. Following normalization,reference values may be calculated for each amplicon, and CNV may bedetected by comparing actual coverage values for each patient'samplicons with the calculated reference values. Additionally oralternatively, detection of a genetic biomarker can include a method isprovided of detecting copy number variation (CNV) in the DNA of aplurality of patients. The method comprises receiving a plurality ofsamples, each sample containing DNA from a single patient, and from eachsample, generating a plurality of fragments of DNA. The method alsocomprises barcoding each of the fragments with an identifier thatuniquely identifies the respective patient from whom the DNA wasreceived, pooling the plurality of samples into a DNA library, andsubjecting the DNA library to one or more stages of filtering toincrease the relative concentration of fragments within a plurality ofselected regions of interest. In some embodiments, the method furthercomprises producing sequencing data for the plurality of patients bysequencing the filtered DNA library, demultiplexing the sequencing data,and, for each patient, generating coverage data by identifying, for eachof the regions of interest, coverage of each region of interest in thesequencing data. In some embodiments, the method comprises generatingnormalized coverage data from the coverage data and generating referencecoverage, common to all samples, for each region of interest, thegeneration of the reference coverage being based upon the normalizedcoverage data. In some embodiments, the method also comprisesautomatically detecting CNV for at least one subsequence of at least oneof the regions of interest of at least one of the patients based uponcomparing the reference coverage to the normalized coverage data andproviding output that identifies the patient, the subsequence, and theCNV. In some embodiments, generating normalized coverage data from thecoverage data comprises generating raw coverage data for each patient byat least generating an estimate of the number of the patient's Xchromosomes based on the coverage data and scaling the patient'scoverage of at least one region of interest that is known to be X-linkedand further comprises generating normalized coverage data from the rawcoverage data. In some embodiments, generating the estimate of thenumber of each patient's X chromosomes occurs without reference to anydemographic information about the respective patient and withoutreference to any information about the respective patient's phenotype.Alternatively, in some embodiments, the method comprises, for eachpatient, generating a second estimate of the number of the patient's Xchromosomes based on the normalized coverage data and also comprisesrevising the normalized coverage data based on the second estimates. Insome embodiments, generating the second estimate of the number of eachpatient's X chromosomes occurs without reference to any demographicinformation about the respective patient and without reference to anyinformation about the respective patient's phenotype. In someembodiments, the coverage data comprises per-base coverage for eachregion of interest within the sequencing data; generating raw coveragedata comprises scaling the patient's per-base coverage of the at leastone region of interest that is known to be X-linked; the normalizedcoverage data comprises per-base coverage; the reference coveragecomprises per-base coverage for each position within each region ofinterest; and automatically detecting CNV is based upon the per-basereference coverage and the normalized coverage data. In someembodiments, each region of interest corresponds respectively to exactlyone exon and contains that exon. In some embodiments, the methodcomprises automatically detecting CNV for a plurality of adjacent exonswithin a gene of a patient, each of the adjacent exons having the sameCNV and automatically rolling up the CNV of the adjacent exons. Further,in an embodiment, the method comprises automatically detecting CNV forall exons within a gene of a patient, each of the exons within the genehaving the same CNV, and automatically rolling up the CNV of the exonswithin the gene into a single CNV for the gene. In some embodiments, themethod comprises subjecting the DNA library to one or more stages ofamplification, such that each region of interest is an amplicon. In someembodiments, the coverage data comprises per-base coverage for eachregion of interest within the sequencing data; the normalized coveragedata comprises per-base coverage; the reference coverage comprisesper-base coverage for each position within each region of interest; andautomatically detecting CNV is based upon the per-base referencecoverage and the normalized coverage data.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2009/0088328, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method of determining the printingquality of nucleic acid arrays and to provide methods to determine theefficiency of procedures to block non-specific binding on nucleic acidarrays. Additionally or alternatively, detection of a genetic biomarkercan include utilizing fluorescence detection to evaluate the quality ofa printed nucleic acid array without the need to add or otherwise link afluorescent compound or dye to the nucleic acid. Nucleic acid arrayssuitable for this analysis are those where the spots of the array areformed by printing a solution that contains the nucleic acid and one ormore ions. Thus, the array is formed from nucleic acid in an ionicsolution and the printing quality is evaluated by the fluorescenceassociated with each printed spot. Printing quality may be evaluated bymeasuring the intensity of fluorescence at the location of each printedsample, and/or by measuring the “morphology” (i.e. shape) of the printedsample. Printed spots can be “imaged” by measuring fluorescence across aspotted sample in two dimensions. The resulting image of a printed spotcan be compared with a reference printed image expected for the printingequipment and solid phase used. The methods can be used to determine thequality the quality of specific spots on an array, to determine thequality of specific regions of an array, or to determine the quality ofan array as a whole. Spot quality and/or array quality can be detectedimmediately following array printing or after the array is subject toprocessing steps prior to hybridization. Such steps may include exposingthe array to heat, humidity, UV irradiation, a blocking procedure,and/or washing. In the case where the quality of a blocking step fornon-specific binding is performed, the quality of blocking can bedetermined by measuring fluorescence at each loaded sample prior to andfollowing a blocking procedure. A decrease in the fluorescence after thewashing and/or blocking procedure indicates the efficiency of theblocking and/or washing step. Additionally or alternatively, detectionof a genetic biomarker can include a method for determining the printingquality of a nucleic acid array prior to hybridization, said methodcomprising: (a) printing an array of nucleic acid samples onto a solidsupport, each sample comprising nucleic acid in an ionic solution; and(b) detecting fluorescence of printed samples to determine the qualityof printing.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2006/0292576, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods of detecting and analyzingchromosomal abnormalities of interest in a test sample. In preferredembodiments, nucleic acids from a test sample are hybridized to twoprobes complementary to different segments of a gene of interest ordifferent segments to a chromosomal fragment of interest. One probe isanchored to the solid support while the second probe comprises adetectable label which is used for detection. This method provides forthe capture and detection of target nucleic acids hybridizing to bothprobes simultaneously. Hybridization of both the first and second probesto the same target nucleic acid indicates detection of a chromosomalabnormality in the target nucleic acid, while hybridization of only oneof the probes to the same target nucleic acid indicates the absence of agenetic abnormality in the target nucleic acid. The anchored probe maybe anchored covalently or non-covalently to the support. If non-covalentattachment is used, a preferred method is via a “binding pair,” whichrefers to two molecules which form a complex through a specificinteraction. Thus, the nucleic acid probe can be captured on the solidsupport through an interaction between one member of the binding pairlinked to the probe and the other member of the binding pair coupled tothe solid support. A binding pair member also can be used to link thedetectable label to the other nucleic acid probe. In a preferredembodiment, the binding pair is biotin and avidin or streptavidin. Inother embodiments, the binding pair is comprised of a ligand-receptor, ahormone-receptor, an antigen-antibody, or an oligonucleotide-complement.In some embodiments, the two probes may be hybridized to the targetnucleic acid in a liquid and then the complex can be captured by a solidsupport. The anchored probe in this approach is preferably anchorednon-covalently and preferably via a binding pair. In other variants, thesolid support may first comprise the anchored probe, which is thencontacted for hybridization with the target nucleic acid, alone ortogether with the labeled probe. Additionally or alternatively,detection of a genetic biomarker can include methods of detecting thepresence or absence of a genetic abnormality in a target nucleic acid ina test sample. The method includes forming on a solid support a complexcomprising the target nucleic acid, a first nucleic acid probehybridizing to a first segment of the target nucleic acid, the firstnucleic acid probe labeled with a detectable label, and a second nucleicacid probe hybridizing to a second segment of the target nucleic acid,the second nucleic acid probe anchored to the solid support. The complexis detected by detecting incorporated detectable label, whereinhybridization of both the first and second probes to the same targetnucleic acid indicates the presence of genetic abnormality in the targetnucleic acid, while hybridization of only one of the probes to the sametarget nucleic acid indicates the absence of a genetic abnormality inthe target nucleic acid. Additionally or alternatively, detection of agenetic biomarker can include methods of detecting a chromosomaltranslocation in the nucleic acid of a test sample. The method includesthe hybridization of two nucleic acid probes, one complementary to asequence of the donor chromosome segment and the other complementary toa sequence of the recipient chromosome which adjoins or is near to theinserted donor chromosome segment. One probe is anchored to the supportand the other probe is labeled with a detectable label. A test sample ofgenomic DNA hybridizing to both probes will form a complex on thesupport or such a complex is preformed and then captured on a solidsupport and detected via the detectable label. The quantity of captured,labeled complex from the test sample represents the test value. If thetest value shows that label is associated with captured hybridizationcomplexes, the test sample is determined to contain the chromosomaltranslocation. In one embodiment, one can compare the test value for thetest sample with a test value from a reference sample which contains thetarget gene but lacking the translocation. Additionally oralternatively, detection of a genetic biomarker can include methods ofdetecting a duplication or deletion in a particular target chromosomalregion or gene in an individual. The method includes forming on a solidsupport a complex comprising the nucleic acid associated with theparticular chromosomal region or gene which is obtained from the sample,a labeled nucleic acid probe hybridizing to a first segment of theparticular chromosomal region or gene, and a second nucleic acid probehybridizing to a second segment of the particular chromosomal region orgene, wherein the second nucleic acid probe is anchored to the solidsupport. In a preferred embodiment, the target nucleic acid is genomicDNA which has been fragmented. The quantity of captured, labeled complexfrom the test sample represents the test value. The test value may becompared to a control value which may be obtained from the quantity ofcomplex obtained from a different target gene or chromosomal regionpreferably from the same sample. A higher test value as compared to thecontrol value is indicative of duplication or amplification, whereas alower test value as compared to control value is indicative of achromosomal or gene deletion. In another approach, one can determine aratio of the test value of the test sample to the control value in thatsample and compare to a similar ratio representing the test value andcontrol value of a reference sample which contains nucleic acid thatdoes not contain a deletion, duplication, or amplification in thechromosomal region or gene of interest. Additionally or alternatively,detection of a genetic biomarker can include methods of determining thediagnosis, predicting response to therapy, detecting minimal residualdisease or prognosis of a disease in an individual. In this method, acomplex is formed between a target nucleic acid from a test sample, aprobe comprising a detectable label and hybridizing to one segment of atarget nucleic acid and a second probe anchored to the support andhybridizing to a second segment of the target nucleic acid. The amountof complex on the solid support is measured through detection ofincorporated detectable label of the first probe. The amount of complexformed is compared to the amount of complex formed in a similar mannerfrom a sample obtained from a reference sample. The reference sample maybe obtained from a normal individual, wherein a difference between themeasurements from the test and reference samples is correlated withdiagnosis or prognosis of a disease. Additionally or alternatively,detection of a genetic biomarker can include methods of monitoringtreatment or progression of a disease. In this method samples areobtained from a patient at different points in time (e.g., before andafter a regimen of treatment of the disease). A complex is formedbetween a target nucleic acid from the first sample, a probe comprisinga detectable label and hybridizing to one segment of a target nucleicacid and a second probe anchored to the support and hybridizing to asecond segment of a target nucleic acid. The amount of complex on thesupport from the first sample is compared to the amount of complexformed using the same probes and target nucleic acid from the secondsample. A difference in amount of complex formed can be correlated toprogression of the disease or success of the treatment regimen.Additionally or alternatively, detection of a genetic biomarker caninclude methods of measuring tumor burden in an individual. In thismethod, a complex is formed on a solid support between a target nucleicacid from a test sample, a probe comprising a detectable label andhybridizing to one segment of a target nucleic acid and a second probeanchored to the support and hybridizing to a second segment of thetarget nucleic acid. The amount of complex on the solid support ismeasured through detection of incorporated detectable label of the firstprobe. The amount of complex formed is compared to a reference value orset of values of the amount of complex formed in a similar manner from asample obtained from a reference sample, from a patient whose tumorburden is known, to determine tumor burden of the test sample. In someembodiments, methods of determining tumor burden include the formationof two complexes on solid support. The first complex comprises a firsttarget nucleic acid from a test sample from the individual and twonucleic acid probes; one containing a detectable label and the otheranchored to the support. The second complex comprises a second orcontrol target nucleic acid from the test sample and two differentnucleic acid probes, one containing a detectable label, distinguishablefrom the label of the first complex, and the other probe anchored to thesolid support. The amount of each of the two complexes is measured and atest ratio determined. This ratio is then compared to a reference ratioor set of ratios that correlate the test ratio to tumor burden.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2006/0127918, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a nucleic-containing substrate thatincludes: (a) an organosilane-pretreated surface; (b) a polymer filmcross-linked to the organosilane-pretreated surface; and (c) a nucleicacid molecule bound to one or more of the polymer film and theorganosilane-pretreated surface. In preferred embodiments, the polymerfilm is formed from a polymer comprising reactive groups, and thenucleic acid molecule has not been covalently modified to facilitatecovalent attachment at the reactive groups. The nucleic acid moleculemay be associated with or bound to one or more of the polymer film andthe organosilane-pretreated surface through covalent and/or noncovalentinteractions. In preferred embodiments, the nucleic acid molecule is atleast about 250 nucleotides in length, and more preferably at leastabout 500 nucleotides in length. In one embodiment, the nucleic acidmolecule is a DNA molecule present in the form of a bacterial artificialchromosome (BAC) or another suitable cloning vector (e.g., an E. coli P1based artificial chromosome, a plasmid, a cosmid, and the like). In someembodiments, the bound nucleic acid molecule is present on the surfaceof the substrate at a concentration sufficient to detect a nucleic acidtarget molecule by nucleic acid hybridization methodology. For example,the nucleic acid molecule may be present at a concentration of at leastabout 500 copies/cm2 on the surface of the substrate. More suitably thenucleic molecule is present on the surface of the substrate at aconcentration of at least about 1000 copies/cm2 and/or at least about5000 copies/cm2. The nucleic acid molecule preferably remainssubstantially attached to the substrate when subjected to washing underhigh stringency conditions (e.g., when the slide is washed with a lowsalt buffer optionally including a non-ionic detergent at a relativelyhigh temperature). In some embodiments, the organosilane is a modifiedsilane molecule that includes alkyl groups. In one embodiment, theorganosilane includes alkyl groups with six or more carbon atoms andpreferably ten or more carbon atoms. The organosilane may include alkoxygroups. The organosilane may also include halide groups. In someembodiments, the polymer comprises reactive groups. Suitable reactivegroups include electrophilic groups that react with nucleophilic groupsunder suitable conditions. For example, reactive groups may includeamino-reactive groups (i.e., groups that react with the nitrogen atom ofan amino group), thiol-reactive groups (i.e., groups that react with thesulfur atom of a thiol-group), hydroxyl-reactive groups (i.e., groupsthat react with the oxygen atom of a hydroxyl-group), and combinationsthereof. In some embodiments, the polymer may include activated esters,epoxides, azlactones, activated hydroxyls, aldehydes, isocyanates,thioisocyanates, carboxylic acid chlorides, alkyl halides, maleimide,α-iodoacetamides, or combinations thereof. In one embodiment, thereactive group is an activated ester, and in particular, the activatedester may include an N-hydroxylsuccinimide ester. In some embodiments,the nucleic acid-containing substrate is configured as a nucleic acidmicroarray. The nucleic acid microarray may be suitable for performingcomparative genomic hybridization analysis. In one embodiment, thenucleic acid microarray comprises genomic DNA cloned in bacterialartificial chromosomes (BACs). Additionally or alternatively, detectionof a genetic biomarker can include a method for preparing a nucleicacid-containing substrate as described above. The method typicallyincludes: (a) pretreating a surface of the substrate with a compositionthat includes an organosilane; (b) coupling a polymer to theorgano-silane pretreated surface to form a polymer film; and (c) bindinga nucleic acid molecule to one or both of the organosilane-pretreatedsurface and the polymer film. In preferred embodiments, the polymer filmis formed from a polymer comprising reactive groups, and the nucleicacid has not been covalently modified to facilitate covalent attachmentat the reactive groups. The nucleic acid molecule may be associated withand/or bound to one or more of the polymer film and theorganosilane-pretreated surface through covalent and/or noncovalentinteractions. Additionally or alternatively, detection of a geneticbiomarker can include a method for detecting the presence and/or amountof a target nucleic acid molecule in a sample that includes: a)contacting the target molecule with a nucleic acid-containing substrate,which is prepared as described above, under suitable conditions forhybridizing the target to the nucleic acid of the substrate; and b)detecting the presence of the target molecule bound to the substrate. Inpreferred embodiments, the nucleic acid-containing substrate is anucleic acid microarray, and detection of the presence and/or amount ofthe nucleic acid target is performed using comparative genomichybridization analysis.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/125892, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods and polynucleotide adapter compositionsrelated to the detection of mutations in ctDNA present in samplesderived from a subject diagnosed as having, or suspected of havingcancer. In some embodiments, the methods allow for rapid and sensitivedetection and profiling of ctDNA mutations in various target nucleicacid sequences in the exons and/or introns of one or more cancer-relatedgenes including, but not limited to ALK, BRAF, EGFR, ERBB2, KIT, KRAS,MET, NRAS, NTRK1, PIK3CA, ROS1, and RET The methods provide for aframework for ultrasensitive ctDNA profiling achieved using accurateanalytical models of detection limits. These qualities improve detectionlimits over previous methods for samples with limited DNA content.Additionally or alternatively, detection of a genetic biomarker caninclude a nucleic acid adapter comprising a first oligonucleotide strandand a second oligonucleotide strand, wherein (a) the firstoligonucleotide strand (i) comprises a first proximal region and a firstdistal region, wherein the first proximal region comprises a firstunique molecular identifier sequence and a first spacer sequence havingthe sequence 5′ TGACT 3′ (SEQ ID NO:), wherein the first spacer sequenceis located 3′ to the first unique molecular identifier sequence; and(ii) does not comprise a degenerate or semi-degenerate sequence; (b) thesecond oligonucleotide strand (i) comprises a second proximal region anda second distal region, wherein the second proximal region comprises asecond unique molecular identifier sequence and a second spacer sequencehaving the sequence 5′ GTCA 3′ (SEQ ID NO:), wherein the spacer sequenceis located 5′ to the second unique molecular identifier; and (ii) doesnot comprise a degenerate or semi-degenerate sequence; (c) the firstproximal region of the first oligonucleotide strand hybridizes with thesecond proximal region of the second oligonucleotide strand; and (d) thefirst distal region of the first oligonucleotide strand does nothybridize with the second distal region of the second oligonucleotidestrand. In some embodiments of the nucleic acid adapter, the “T”nucleotide located at the 3′ end of the first spacer sequence contains aphosphorothioate bond. In some embodiments of the nucleic acid adapter,the 5′ end of the first oligonucleotide strand is labelled with biotin.In other embodiments of the nucleic acid adapter, the 3′ end of thesecond oligonucleotide strand is labelled with biotin. In someembodiments, the nucleic acid adapter is used to sequence adouble-stranded target nucleic acid molecule selected from the groupconsisting of double-stranded DNA or double-stranded RNA. Thedouble-stranded DNA may be sheared genomic DNA, or cell-free DNA. Insome embodiments, the nucleic acid adapter of the present technologyfurther comprises at least two PCR primer binding sites, at least twosequencing primer binding sites, or any combination thereof.Additionally or alternatively, in some embodiments, the nucleic acidadapter of the present technology further comprises a sample-specificbarcode sequence, wherein the sample-specific barcode sequence comprises2-20 nucleotides. Additionally or alternatively, detection of a geneticbiomarker can include a method for detecting at least one mutation in adouble-stranded circulating tumor DNA (ctDNA) molecule present in asample obtained from a patient comprising (a) ligating a plurality ofY-shaped adapters to both ends of the double-stranded ctDNA molecule toform a double-stranded adapter-ctDNA complex, each Y-shaped adaptercomprising a first oligonucleotide strand and a second oligonucleotidestrand; (b) amplifying both strands of the adapter-ctDNA complex toproduce first amplicons and second amplicons, wherein the firstamplicons are derived from the first oligonucleotide strand, and thesecond amplicons are derived from the second oligonucleotide strand; (c)sequencing the first and second amplicons; (d) detecting at least onemutation in the double-stranded ctDNA molecule, when a mutation detectedin the first amplicons is consistent with a mutation detected in thesecond amplicons. In some embodiments of the method, the patient isdiagnosed with ovarian cancer, breast cancer, colon cancer, lung cancer,prostate cancer, gastric cancer, pancreatic cancer, cervical cancer,liver cancer, bladder cancer, cancer of the urinary tract, thyroidcancer, renal cancer, carcinoma, melanoma, head and neck cancer, orbrain cancer. In some embodiments, the method further comprisesenriching the first amplicons and second amplicons with a plurality ofbait sequences, wherein the plurality of bait sequences comprises atleast one gene region that corresponds to each of a plurality ofcancer-related genes. The plurality of cancer-related genes may compriseALK, BRAF, EGFR, ERBB2, KIT, KRAS, MET, NRAS, NTRK1, PIK3CA, ROS1, andRET. Additionally or alternatively, in some embodiments of the method,the plurality of bait sequences are RNA baits, DNA baits, or a mixtureof RNA baits and DNA baits. In certain embodiments, the plurality ofbait sequences comprises a 1:1 mixture of RNA baits and DNA baits. Inother embodiments, the plurality of bait sequences comprises a mixtureof RNA baits and DNA baits having a ratio of 2:1, 1.5:1, 0.75:1 or0.5:1. In certain embodiments of the method, both of the 3′ ends of thedouble-stranded ctDNA molecule further comprise an “A”-overhang. In anyof the above embodiments, each Y-shaped adapter further comprises atleast two sequencing primer binding sites. Additionally oralternatively, in some embodiments, each Y-shaped adapter furthercomprises a patient-specific barcode sequence, wherein thepatient-specific barcode sequence comprises 2-20 nucleotides. EachY-shaped adapter of the present technology may be labelled with biotin.In some embodiments of the method, the sample comprises no more than 5ng of cell-free DNA. In other embodiments, the sample comprises at least6-30 ng of cell-free DNA. In certain embodiments, the sample is wholeblood, serum, plasma, synovial fluid, lymphatic fluid, ascites fluid, orinterstitial fluid.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,389,234, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods and system for molecular profiling and using the results frommolecular profiling to identify treatments for individuals. In someembodiments, the treatments were not identified initially as a treatmentfor the disease. Additionally or alternatively, detection of a geneticbiomarker can include a method of identifying a candidate treatment fora subject in need thereof, comprising: performing animmunohistochemistry (IHC) analysis on a sample from the subject todetermine an IHC expression profile on at least five proteins;performing a microarray analysis on the sample to determine a microarrayexpression profile on at least ten genes; performing a fluorescentin-situ hybridization (FISH) analysis on the sample to determine a FISHmutation profile on at least one gene; performing DNA sequencing on thesample to determine a sequencing mutation profile on at least one gene;and comparing the IHC expression profile, microarray expression profile,FISH mutation profile and sequencing mutation profile against a rulesdatabase. The rules database comprises a mapping of treatments whosebiological activity is known against cancer cells that: i. overexpressor underexpress one or more proteins included in the IHC expressionprofile; ii. overexpress or underexpress one or more genes included inthe microarray expression profile; iii. have no mutations, or one ormore mutations in one or more genes included in the FISH mutationprofile; and/or iv. have no mutations, or one or more mutations in oneor more genes included in the sequencing mutation profile. The candidatetreatment is identified if: i. the comparison step indicates that thetreatment should have biological activity against the cancer; and ii.the comparison step does not contraindicate the treatment for treatingthe cancer. In some embodiments, the IHC expression profiling comprisesassaying one or more of SPARC, PGP, Her2/neu, ER, PR, c-kit, AR, CD52,PDGFR, TOP2A, TS, ERCC1, RRM1, BCRP, TOPO1, PTEN, MGMT, and MRP1. Insome embodiments, the microarray expression profiling comprise assayingone or more of ABCC1, ABCG2, ADA, AR, ASNS, BCL2, BIRC5, BRCA1, BRCA2,CD33, CD52, CDA, CES2, DCK, DHFR, DNMT1, DNMT3A, DNMT3B, ECGF1, EGFR,EPHA2, ERBB2, ERCC1, ERCC3, ESR1, FLT1, FOLR2, FYN, GART, GNRH1, GSTP1,HCK, HDAC1, HIF1A, HSP90AA1, IL2RA, HSP90AA1, KDR, KIT, LCK, LYN, MGMT,MLH1, MS4A1, MSH2, NFKB1, NFKB2, OGFR, PDGFC, PDGFRA, PDGFRB, PGR,POLA1, PTEN, PTGS2, RAF1, RARA, RRM1, RRM2, RRM2B, RXRB, RXRG, SPARC,SRC, SSTR1, SSTR2, SSTR3, SSTR4, SSTR5, TK1, TNF, TOP1, TOP2A, TOP2B,TXNRD1, TYMS, VDR, VEGFA, VHL, YES1, and ZAP70. In some embodiments, theFISH mutation profiling comprises assaying EGFR and/or HER2. In someembodiments, the sequencing mutation profiling comprises assaying one ormore of KRAS, BRAF, c-KIT and EGFR. Additionally or alternatively,detection of a genetic biomarker can include a method of identifying acandidate treatment for a subject in need thereof, comprising:performing an immunohistochemistry (IHC) analysis on a sample from thesubject to determine an IHC expression profile on at least five of:SPARC, PGP, Her2/neu, ER, PR, c-kit, AR, CD52, PDGFR, TOP2A, TS, ERCC1,RRM1, BCRP, TOPO1, PTEN, MGMT, and MRP1; performing a microarrayanalysis on the sample to determine a microarray expression profile onat least five of: ABCC1, ABCG2, ADA, AR, ASNS, BCL2, BIRC5, BRCA1,BRCA2, CD33, CD52, CDA, CES2, DCK, DHFR, DNMT1, DNMT3A, DNMT3B, ECGF1,EGFR, EPHA2, ERBB2, ERCC1, ERCC3, ESR1, FLT1, FOLR2, FYN, GART, GNRH1,GSTP1, HCK, HDAC1, HIF1A, HSP90AA1, IL2RA, HSP90AA1, KDR, KIT, LCK, LYN,MGMT, MLH1, MS4A1, MSH2, NFKB1, NFKB2, OGFR, PDGFC, PDGFRA, PDGFRB, PGR,POLA1, PTEN, PTGS2, RAF1, RARA, RRM1, RRM2, RRM2B, RXRB, RXRG, SPARC,SRC, SSTR1, SSTR2, SSTR3, SSTR4, SSTR5, TK1, TNF, TOP1, TOP2A, TOP2B,TXNRD1, TYMS, VDR, VEGFA, VHL, YES1, and ZAP70; performing a fluorescentin-situ hybridization (FISH) analysis on the sample to determine a FISHmutation profile on EGFR and/or HER2; performing DNA sequencing on thesample to determine a sequencing mutation profile on at least one ofKRAS, BRAF, c-KIT and EGFR; and comparing the IHC expression profile,microarray expression profile, FISH mutation profile and sequencingmutation profile against a rules database. The rules database comprisesa mapping of treatments whose biological activity is known againstcancer cells that: i. overexpress or underexpress one or more proteinsincluded in the IHC expression profile; ii. overexpress or underexpressone or more genes included in the microarray expression profile; iii.have no mutations, or one or more mutations in one or more genesincluded in the FISH mutation profile; and/or iv. have no mutations, orone or more mutations in one or more genes included in the sequencingmutation profile. The candidate treatment is identified if: i. thecomparison step indicates that the treatment should have biologicalactivity against the cancer; and ii. the comparison step does notcontraindicate the treatment for treating the cancer. In someembodiments, the IHC expression profiling is performed on at least 50%,60%, 70%, 80% or 90% of the biomarkers listed. In some embodiments, themicroarray expression profiling is performed on at least 50%, 60%, 70%,80% or 90% of the biomarkers listed. Additionally or alternatively,detection of a genetic biomarker can include a method of identifying acandidate treatment for a cancer in a subject in need thereof,comprising: performing an immunohistochemistry (IHC) analysis on asample from the subject to determine an IHC expression profile on atleast the group of proteins consisting of: SPARC, PGP, Her2/neu, ER, PR,c-kit, AR, CD52, PDGFR, TOP2A, TS, ERCC1, RRM1, BCRP, TOPO1, PTEN, MGMT,and MRP1; performing a microarray analysis on the sample to determine amicroarray expression profile on at least the group of genes consistingof ABCC1, ABCG2, ADA, AR, ASNS, BCL2, BIRC5, BRCA1, BRCA2, CD33, CD52,CDA, CES2, DCK, DHFR, DNMT1, DNMT3A, DNMT3B, ECGF1, EGFR, EPHA2, ERBB2,ERCC1, ERCC3, ESR1, FLT1, FOLR2, FYN, GART, GNRH1, GSTP1, HCK, HDAC1,HIF1A, HSP90AA1, IL2RA, HSP90AA1, KDR, KIT, LCK, LYN, MGMT, MLH1, MS4A1,MSH2, NFKB1, NFKB2, OGFR, PDGFC, PDGFRA, PDGFRB, PGR, POLA1, PTEN,PTGS2, RAF1, RARA, RRM1, RRM2, RRM2B, RXRB, RXRG, SPARC, SRC, SSTR1,SSTR2, SSTR3, SSTR4, SSTR5, TK1, TNF, TOP1, TOP2A, TOP2B, TXNRD1, TYMS,VDR, VEGFA, VHL, YES1, and ZAP70; performing a fluorescent in-situhybridization (FISH) analysis on the sample to determine a FISH mutationprofile on at least the group of genes consisting of EGFR and HER2;performing DNA sequencing on the sample to determine a sequencingmutation profile on at least the group of genes consisting of KRAS,BRAF, c-KIT and EGFR; and comparing the IHC expression profile,microarray expression profile, FISH mutation profile and sequencingmutation profile against a rules database. The rules database comprisesa mapping of treatments whose biological activity is known againstcancer cells that: i. overexpress or underexpress one or more proteinsincluded in the IHC expression profile; ii. overexpress or underexpressone or more genes included in the microarray expression profile; iii.have zero or more mutations in one or more genes included in the FISHmutation profile; and/or iv. have zero or more mutations in one or moregenes included in the sequencing mutation profile. The candidatetreatment is identified if: i. the comparison step indicates that thetreatment should have biological activity against the cancer; and ii.the comparison step does not contraindicate the treatment for treatingthe cancer. In some embodiments, the microarray expression profiling isperformed using a low density microarray, an expression microarray, acomparative genomic hybridization (CGH) microarray, a single nucleotidepolymorphism (SNP) microarray, a proteomic array or an antibody array.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0171337, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method, comprising: a) determining amolecular profile for at least one sample from the subject by assessinga plurality of genes and/or gene products; and b) identifying, based onthe molecular profile, at least one of: i) at least one treatment thatis associated with benefit for treatment of the cancer; ii) at least onetreatment that is associated with lack of benefit for treatment of thecancer; and iii) at least one treatment associated with a clinicaltrial. The plurality of genes and/or gene products can be chosen fromamongst genes and or gene products (e.g., transcripts and proteins) withefficacy known to be related to various chemotherapeutic agents.Additionally or alternatively, detection of a genetic biomarker caninclude mutational analysis performed on any desired panel of genes. Inan embodiment, assessing the plurality of genes and/or gene productsfurther comprises mutational analysis of at least one, e.g., at least 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,40, 41, 42, 43, 44, 45 or 46, of ABL1, AKT1, ALK, APC, ATM, BRAF, BRCA1,BRCA2, CDH1, CSF1R, CTNNB1, EGFR, ERBB2 (HER2), ERBB4 (HER4), FBXW7,FGFR1, FGFR2, FLT3, GNA11, GNAQ, GNAS, HNF1A, HRAS, IDH1, JAK2, JAK3,KDR (VEGFR2), KIT (cKIT), KRAS, MET (cMET), MPL, NOTCH1, NPM1, NRAS,PDGFRA, PIK3CA, PTEN, PTPN11, RB1, RET, SMAD4, SMARCB1, SMO, STK11, TP53and VHL. The mutational analysis may comprise any useful combination ofthese genes. The mutational analysis can be used to assess at least one,e.g., at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 of amutation, a polymorphism, a deletion, an insertion, a substitution, atranslocation, a fusion, a break, a duplication, an amplification, arepeat, a copy number variation, a transcript variant, and a splicevariant. The mutational analysis can be performed using any usefullaboratory method or combination of methods. For example, the mutationalanalysis can be performed using at least one of ISH, amplification, PCR,RT-PCR, hybridization, microarray, sequencing, pyrosequencing, Sangersequencing, high throughput or Next Generation sequencing (NGS),fragment analysis or RFLP. In some embodiments, the mutational analysiscomprises Next Generation Sequencing.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication Nos. 2017/0175197, 2017/0039328,and 2015/0307947 and P.C.T. Publication No. WO 2012/092336, which arehereby incorporated by reference in its entirety.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/053915, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include systems, methods, apparatuses, and computerprogram products for providing a user interface for an application foranalyzing biological data. Additionally or alternatively, detection of agenetic biomarker can include a method of analyzing biological data, themethod comprising: receiving, at a computing device comprising aprocessor and memory, patient data for a plurality of patients, thepatient data corresponding to at least one of a biological samplingevent, a biological processing event, at least one therapeutic regime,at least one biomarker status, and a patient status; determining atleast one interrelationship between any one of the biological samplingevent, the biological processing event, the at least one therapeuticregime, the at least one biomarker status, and the patient status;performing a therapeutic regime analysis to determine aninterrelationship status for the interrelationship between at least onetherapeutic regime and at least one of the patient status and the atleast one biomarker status; and displaying at least one graphicalinterface on a user interface in communication with the computingdevice, the graphical interface including a plurality of visualelements, each visual element of the plurality of visual elements beingassociated with the patient data, at least one visual element beingassociated with the at least one interrelationship, at least one visualelement including an indicium corresponding to at least one of theinterrelationship status and the biomarker status. Additionally oralternatively, detection of a genetic biomarker can include a method ofanalyzing biological data associated with a biological sample from atarget patient, the method comprising: receiving, at a computing devicecomprising a processor and memory, patient data associated with thetarget patient, the patient data corresponding to a biological samplingevent, a biological processing event, a therapeutic regime, a markerstatus, and a patient status; receiving reference data associated with aplurality of patients, the reference data corresponding to a pluralityof biological sampling events, biological processing events, therapeuticregimes, marker statuses, and patient statuses; determining at least oneinterrelationship between any one of the biological sampling events, thebiological processing events, the therapeutic regimes, the markerstatuses, and the patient statuses; performing a therapeutic regimeanalysis to determine the interrelationship between at least onetherapeutic regime and at least one of the at least one patient statusand the at least one marker status; displaying at least one graphicaluser interface, the graphical user interface configured to: i) display aplurality of graphical user interface objects associated with thereference data, ii) display a plurality of graphical user interfaceobjects associated with the patient data, iii) display, on at least onegraphical interface on a user interface in communication with thecomputing device, a primary graphical user interface object configuredto, upon receiving an indication of a user input defining a selection ofthe primary graphical user interface object, cause the graphical userinterface to display a secondary graphical user interface object; andassisting in providing patient care based on the one or moreinterrelationships displayed on the user interface. In some embodiments,the method may further comprise manipulating a primary visual element todisplay a secondary visual element including additional informationcorresponding to the patient data upon selection thereof. The method mayfurther comprise displaying the secondary visual element such that thesecondary visual element overlays the primary visual element or theprimary visual element is resized such that the secondary visual elementis displayed adjacent to the primary visual element. In someembodiments, the method may further comprise assisting in providingpatient care based on the one or more interrelationships displayed onthe user interface. In some embodiments, assisting in providing thepatient care comprises assisting in at least one of providing adiagnosis, providing a prognosis, selecting a recommended therapeuticregime, generating a hypothesis, and evaluating an efficiency of thetherapeutic regime, based on the one or more interrelationships. In someembodiments, assisting in providing the patient care comprisesselectively manipulating the graphical interface and one or more of theplurality of visual elements displayed thereon to visually compare atarget patient against a set of reference patients. Visually comparingthe target patient against the set of reference patients can be based onvarious desired attributes, including without limitation shared patientattributes, the at least one therapeutic regime, and/or the at least onebiomarker status. Additionally or alternatively, detection of a geneticbiomarker can include a computer-readable storage medium that isnon-transitory and has computer-readable program code portions storedtherein that, in response to execution by a processor, cause anapparatus to at least: receive, at a computing device comprising theprocessor and memory, patient data for a plurality of patients, thepatient data corresponding to at least one of a biological samplingevent, a biological processing event, at least one therapeutic regime,at least one biomarker status, and a patient status; determine at leastone interrelationship between any one of the biological sampling event,the biological processing event, the at least one therapeutic regime,the at least one biomarker status, and the patient status; perform atherapeutic regime analysis to determine an interrelationship status forthe interrelationship between at least one therapeutic regime and atleast one of the patient status and the at least one biomarker status;and display at least one graphical interface on a user interface incommunication with the computing device, the graphical interfaceincluding a plurality of visual elements, each visual element of theplurality of visual elements being associated with the patient data, atleast one visual element being associated with the at least oneinterrelationship, at least one visual element including an indiciumcorresponding to at least one of the interrelationship status and thebiomarker status. Additionally or alternatively, detection of a geneticbiomarker can include an apparatus for analyzing biological data, theapparatus including a user interface, and a computing device incommunication with the user interface, the computing device comprising aprocessor and memory including computer-readable program code storedtherein, the computer-readable code configured, upon the executionthereof by the processor, to cause the apparatus to: receive patientdata for a plurality of patients, the patient data corresponding to atleast one of a biological sampling event, a biological processing event,at least one therapeutic regime, at least one biomarker status, and apatient status; determine at least one interrelationship between any oneof the biological sampling event, the biological processing event, theat least one therapeutic regime, the at least one biomarker status, andthe patient status; perform a therapeutic regime analysis to determinean interrelationship status for the interrelationship between at leastone therapeutic regime and at least one of the patient status and the atleast one biomarker status; and display at least one graphical interfaceon the user interface, the graphical interface including a plurality ofvisual elements, each visual element of the plurality of visual elementsbeing associated with the patient data, at least one visual elementbeing associated with the at least one interrelationship, at least onevisual element including an indicium corresponding to at least one ofthe interrelationship status and the biomarker status.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/064229, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include aptamers that bind biomarkers of interest. In someembodiments, oligonucleotide probes are used to detect the presence orlevels of biomarkers or other biological entity in a biological sample.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/205686, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include oligonucleotide probes that recognize tissueshaving phenotypes of interest. In various embodiments, theoligonucleotide probes are used in diagnostic, prognostic or theranosticprocesses to characterize a phenotype of that sample. The diagnosis maybe related to a cancer. Additionally or alternatively, detection of agenetic biomarker can include a method of enriching an oligonucleotidelibrary comprising a plurality of oligonucleotides, comprising: (a)providing a support arrayed with a plurality of samples; (b) contactingthe support with the plurality of oligonucleotides; and (c) recoveringmembers of the oligonucleotide probe library that bound to members ofthe plurality of samples, thereby enriching the oligonucleotide probelibrary. Additionally or alternatively, detection of a genetic biomarkercan include a method of enriching an oligonucleotide library comprisinga plurality of oligonucleotides, the method comprising: (a) performingat least one round of positive selection, wherein the positive selectioncomprises: (i) simultaneously contacting a plurality of samples with theplurality of oligonucleotides; and (ii) recovering members of theplurality of oligonucleotides that associated with the plurality ofsamples; (iii) optionally performing at least one round of negativeselection, wherein the negative selection comprises: (i) simultaneouslycontacting a plurality of control samples with the plurality ofoligonucleotides; (ii) recovering members of the plurality ofoligonucleotides that did not associate with the plurality of controlsamples. In embodiments of the methods of enrichment, the plurality ofsamples is chosen to be representative of a phenotype of interest.Additionally or alternatively, detection of a genetic biomarker caninclude a method of characterizing a phenotype in a sample comprising:(a) arraying at least one sample on a substrate; (b) contacting thesubstrate with a plurality of oligonucleotides; and (b) measuring apresence or level of a complex formed between members of the pluralityof oligonucleotides and the samples arrayed on the substrate, whereinthe presence or level is used to characterize the phenotype.Additionally or alternatively, detection of a genetic biomarker caninclude a kit comprising at least one reagent for carrying out themethods, including methods of enrichment and characterizing.Additionally or alternatively, detection of a genetic biomarker caninclude use of at least one reagent for carrying out the methods. The atleast one reagent can be any useful reagent, including withoutlimitation at least one of a support, a plurality of nucleotides, afiltration unit, and PEG.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 10,011,826, which is hereby incorporated by referencein its entirety. For example, detection of a genetic biomarker caninclude a method of isolating/extracting in parallel variousbiomolecules, in particular nucleic acids and proteins, from the samefixed biological samples. In some embodiments, the methods also comprisequantifying and analyzing the biomolecules isolated by the method, a kitfor isolating/extracting in parallel various biomolecules from a fixedsample, and using said kit for diagnosing, prognosing, deciding thetherapy of and monitoring the therapy of a disease. In some embodiments,a method of parallel purification of various kinds of biomolecules fromthe same biological starting material fixed by crosslinking, comprises:a) a step of dissolving said crosslinking of the starting material, b) astep of separating the different biomolecules present in the startingmaterial into at least one fraction (A) and at least one fraction (B),and c) isolating or detecting, or isolating and detecting differentbiomolecules from at least one of said fractions (A) and (B) of step b),and a kit for performing said method.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,797,000, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of detecting the presence of a target RNA, the methodcomprising: a) providing at least one DNA capture probe, wherein the atleast one DNA capture probe is bound to a support; b) hybridizing thetarget RNA to said at least one DNA capture probe, yielding a targetRNA:DNA capture probe complex; c) isolating the target RNA:DNA captureprobe complex; d) providing at least one DNA amplification probe, andhybridizing said at least one DNA amplification probe to said targetRNA:DNA capture probe complex, yielding a target RNA:DNAcapture/amplification probe complex; e) providing an anti-RNA:DNA hybridantibody, and incubating said target RNA:DNA capture/amplification probecomplex with said antibody, yielding a target RNA:DNA:antibody complex;f) detecting said antibody, wherein said detecting indicates thepresence of said target RNA. In some embodiments, the antibody isconjugated to a detectable marker, and the step of detecting comprisesdetecting the marker. In one aspect, the detectable marker is selectedfrom the group consisting of alkaline phosphatase and horseradishperoxidase. In some embodiments, the step of detecting comprisesproviding a second antibody that binds to said anti-RNA:DNA hybridantibody, wherein said second antibody is conjugated to a detectablemarker, and wherein said detecting further comprises detecting themarker. In some embodiments, the support comprises a magnetic bead. Inone aspect, the magnetic bead is conjugated to at least one streptavidinmolecule, and the at least one DNA capture probe is conjugated to abiotin molecule. Additionally or alternatively, detection of a geneticbiomarker can include a method of providing target RNA for detection,the method comprising: incubating a biological sample containing thetarget RNA with carboxyl beads; isolating the beads; lysing thebiological sample attached to the isolated beads; and isolating thebeads from the lysed biological sample, wherein the resultingsupernatant contains the target RNA for detection.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,689,047, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of detecting a target nucleic acid in a sample comprisingnon-target nucleic acids is provided, said method comprising: (a)purifying the target nucleic acid from the sample by a methodcomprising: (i) contacting the sample with at least one purificationprobe, wherein at least a portion of the nucleic acid probe hybridizesto the at least one target nucleic acid to form a DNA:RNA hybrid; (ii)immobilizing the DNA:RNA hybrid to a first solid support by a methodcomprising contacting the DNA:RNA hybrid with at least a first antibodycapable of binding to the DNA:RNA hybrid, wherein the antibody is boundto or adapted to be bound to the first solid support; and (iii)separating the first solid support from the sample to generate at leastone purified target nucleic acid; b. genotyping the purified targetnucleic acid by a method comprising: (i) amplifying at least a portionof the purified target nucleic acid to generate an amplicon, such as byan isothermal amplification, such as whole genome amplification; (ii)immobilizing the amplicon to a second solid support by a methodcomprising contacting the amplicon with at least one immobilizationprobe, wherein: (a) the immobilization probe is bound to or adapted tobe bound to the second solid support; and ((3) at least a portion of theimmobilization probe hybridizes the at least one target nucleic acid;(iii) contacting the immobilized amplicon with at least one detectionprobe, wherein the at least a portion of the detection probe hybridizesto the at least one target nucleic acid to generate a detection complex;and (iv) detecting at least a first detectable signal generated by thedetection complex, wherein the detectable signal indicates the genotypeof the target nucleic acid. In some embodiments, the plurality ofpurified target nucleic acids is contacted with a plurality ofimmobilization probes, wherein each of the plurality of immobilizationprobes is specific for a distinct purified target nucleic acid.Additionally or alternatively, detection of a genetic biomarker caninclude a method is provided comprising: a. a purifying step comprising:generating a double-stranded nucleic acid hybrid of the at least onetarget nucleic acid by hybridizing the at least one target nucleic acidto a hybrid probe set comprising at least a first nucleic acid probespecific for the at least one target nucleic acid; immobilizing thedouble-stranded nucleic acid hybrid to a first solid support through bycontacting the double-stranded nucleic acid hybrid with at least a firstantibody capable of binding to the double-stranded nucleic acid hybridand binding the at least a first antibody to the first solid support;and separating the double-stranded nucleic acid hybrid from the sampleto generate at least one purified nucleic acid; b. an amplifying step,wherein at least a portion of the at least one purified nucleic acid isamplified to generate amplified nucleic acids; and c. a genotyping stepcomprising: immobilizing the amplified nucleic acids to at least asecond solid support by hybridizing the amplified nucleic acids to animmobilization probe set comprising at least one polynucleotide probespecific for the at least one target nucleic acid; and detecting thepresence of the at least one target nucleic acid with a detection probeset comprising at least one polynucleotide probe specific for the atleast one target nucleic acid.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,422,593, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of detecting at least one target nucleic acid comprising: (1)sequence-specific isolation of a target nucleic acid from a sample; (2)amplifying the isolated target nucleic acid; and (3) detecting thetarget nucleic acid using a plurality of detectably labeled nucleic aciddetection probes, wherein each (a) bears a different detectable labelfrom the other detection probes, and/or (b) has a different meltingtemperature from probes bearing the same detectable label. In someembodiments, the method comprises: A. purifying the at least one targetnucleic acid by a method comprising: A1. generating a double-strandednucleic acid hybrid of the at least one target nucleic acid byhybridizing the at least one target nucleic acid to a hybrid probe setcomprising at least a first nucleic acid probe specific for the at leastone target nucleic acid; A2. separating the double-stranded nucleic acidhybrid from the sample to generate at least one purified nucleic acid;B. amplifying at least a portion of the at least one purified nucleicacid; and C. detecting the target nucleic acid by a method comprising:C1. contacting the amplified nucleic acid with at least one detectionprobe set, wherein: C1(a). each of the detection probes of the detectionprobe set bears a detectable label; C1(b). at least two of the detectionprobes of the detection probe set carry the same detectable label; andC1(c). each of the probes carrying the same detectable label has amelting temperature (Tm) which differs from the other probes with thesame label; C2. detecting the amplified nucleic acid by determiningwhether the labeled probe has hybridized to its nucleic acid sequence;and C3. detecting the temperature at which each detection probedissociates from the nucleic acid sequence to which it has bound. Insome embodiments, the double-stranded nucleic acid hybrid is separatedfrom the sample by a method comprising contacting the double strandednucleic acid hybrid with a molecule that binds specifically todouble-stranded nucleic acid hybrids, preferably an anti-DNA:RNA hybridantibody.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,593,366, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of randomly amplifying a target nucleic acid sequence, themethod comprising, bringing into contact a set of primers, DNApolymerase, and a target sample, wherein the primers are randomG-deficient primers, and incubating the target sample under conditionsthat promote replication of the target sequence, wherein replication ofthe target sequence results in replicated strands. Additionally oralternatively, detection of a genetic biomarker can include a method ofrandomly amplifying a target nucleic acid sequence, the methodcomprising, bringing into contact a set of primers, DNA polymerase, anda target sample, wherein the primers are random G-deficient primers, andincubating the target sample under conditions that promote replicationof the target sequence, wherein nucleic acids in the target sample arenot separated from other material in the target sample. Additionally oralternatively, detection of a genetic biomarker can include a method ofrandomly amplifying messenger RNA, the method comprising, reversetranscribing messenger RNA to produce a first strand cDNA, bringing intocontact a set of random G-deficient primers, DNA polymerase, and thefirst strand cDNA, and incubating under conditions that promotereplication of the first strand cDNA, wherein replication of the firststrand cDNA results in replicated strands, wherein during replication atleast one of the replicated strands is displaced from the first strandcDNA by strand displacement replication of another replicated strand.Additionally or alternatively, detection of a genetic biomarker caninclude a method of randomly amplifying a target nucleic acid sequence,the method comprising: (a) mixing a set of random G-deficient primerswith a target sample, to produce a primer-target sample mixture, andincubating the primer-target sample mixture under conditions thatpromote hybridization between the random G-deficient primers and thetarget sequence in the primer-target sample mixture, and (b) mixing DNApolymerase with the primer-target sample mixture, to produce apolymerase-target sample mixture, and incubating the polymerase-targetsample mixture under conditions that promote replication of the targetsequence, wherein replication of the target sequence results inreplicated strands, wherein during replication at least one of thereplicated strands is displaced from the target sequence by stranddisplacement replication of another replicated strand, wherein thetarget sequence is a nucleic acid sample of substantial complexity.Additionally or alternatively, detection of a genetic biomarker caninclude a method of randomly amplifying a whole genome, the methodcomprising, bringing into contact a set of random G-deficient primers,DNA polymerase, and a target sample, and incubating the target sampleunder conditions that promote replication of the target sequence.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,487,823, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includecompositions and a method for amplification of nucleic acid sequences ofinterest. In some embodiments, the method is based on stranddisplacement replication of the nucleic acid sequences by primers. Insome embodiments, the method is a form of multiple displacementamplification (MDA) useful for amplifying genomic nucleic acid samplesand other nucleic acid samples of high complexity. The method can beused to amplify such highly complex nucleic acid samples using only oneor a limited number of primers. It has been discovered that one or asmall number of primers can effectively amplify whole genomes and othernucleic acid samples of high sequence complexity. The primers arespecially selected or designed to be able to prime and efficientlyamplify the broad range of sequences present in highly complex nucleicacid samples despite the limited amount of primer sequence representedin the primers. The method generally involves bringing into contact one,a few, or more primers having specific nucleic acid sequences, DNApolymerase, and a nucleic acid sample, and incubating the nucleic acidsample under conditions that promote replication of nucleic acidmolecules in the nucleic acid sample. Replication of the nucleic acidmolecules results in replicated strands such that, during replication,the replicated strands are displaced from the nucleic acid molecules bystrand displacement replication of another replicated strand. Thereplication can result in amplification of all or a substantial fractionof the nucleic acid molecules in the nucleic acid sample. In someembodiments, the method, which uses a form of whole genome stranddisplacement amplification (WGSDA), one, a few, or more primers are usedto prime a sample of genomic nucleic acid (or another sample of nucleicacid of high complexity). Additionally or alternatively, detection of agenetic biomarker can include a method of amplifying human genomes, themethod comprising: bringing in to contact a single DNA primer of atleast 6 nucleotides in length which is non-degenerate and non-random, anon-human, strand displacement DNA polymerase, and a human genomicnucleic acid sample to form a mixture, and incubating the mixture underconditions that promote replication of nucleic acid molecules in thehuman genomic nucleic acid sample; wherein the primer hybridizes tonucleic acid molecules in the genomic nucleic acid sample, and whereinthe primer has a specific nucleotide sequence, wherein the genomicnucleic acid sample comprises all or a substantial portion of a humangenome; and replicating the nucleic acid molecules in the human genomicnucleic acid sample under isothermal conditions, wherein replication ofnucleic acid molecules in the genomic nucleic acid sample proceeds bystrand displacement replication, wherein replication of the nucleic acidmolecules in the genomic nucleic acid sample results in replication ofall or a substantial fraction of the nucleic acid molecules in thegenomic nucleic acid sample.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,115,410, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea nucleic acid detection method, referred to as target-specific HYBRIDCAPTURE (“TSHC”). Additionally or alternatively, detection of a geneticbiomarker can include a method of detecting and/or quantifying one ormore target nucleic acids, comprising the steps of target enrichment,amplification, and detection for the rapid and sensitive detection ofthe target nucleic acid sequences. In some embodiments, one or moretarget nucleic acids are detected by: capturing the target nucleic acidsto a solid support by mixing the target nucleic acids, nucleic acidprobes complementary to the target nucleic acids, wherein one is RNA andthe other is DNA, and a solid support; removing unbound target nucleicacids and nucleic acid probes; amplifying the captured target nucleicacids or nucleic acid probes, forming a plurality of amplicons, wherethe presence of the amplicons is indicative of the presence of thetarget nucleic acids; and detecting the target nucleic acids by mixingthe target nucleic acids with selectable and distinguishableoligonucleotides which hybridize to a portion of the target nucleicacids, (i.e., capture sequence probes; CSPs) and nucleic acid probescomplementary to a different portion of the target nucleic acids (i.e.,signal sequence probes; SSPs), wherein either the probe or target is anRNA and the other is DNA, where DNA:RNA hybrids are detected by DNA:RNAhybrid-specific binding agents, which are directly or indirectlylabeled, thereby detecting the target nucleic acids. The SSPs are notlimited to serving as only a means for producing a signal for detection;but may be used in the target enrichment step by hybridizing to thetarget nucleic acid, enabling capture with a DNA:RNA hybrid-specificbinding agent. In some embodiments, a plurality of target nucleic acidsare detected by: hybridizing a plurality of target nucleic acids tonucleic acid probes which are complementary to the target nucleic acids,forming DNA:RNA hybrids; capturing the DNA:RNA hybrids with DNA:RNAhybrid-specific antibodies conjugated to solid supports; removingunbound target nucleic acids and nucleic acid probes; amplifying thecaptured target nucleic acids or nucleic acid probes, forming aplurality of amplicons, using random primers and DNA polymerase, wherethe presence of the plurality of amplicons is indicative of the presenceof the target nucleic acids; hybridizing nucleic acid probescomplementary to a portion of the target nucleic acid sequences, formingDNA:RNA hybrids between targets and probes; hybridizing oligonucleotidesconjugated to a solid support to a different portion of the targetnucleic acids, wherein the solid support is selectable; selecting theoligonucleotide complexes; and detecting the plurality of target nucleicacids by binding DNA:RNA hybrid-specific binding agents to the DNA:RNAhybrids. In some embodiments, one or more target DNAs are detected by amultiplex method having the steps of: hybridizing a plurality of targetDNAs to RNA probes which are complementary to the target DNAs, formingDNA:RNA hybrids; capturing the DNA:RNA hybrids with DNA:RNAhybrid-specific antibodies which are conjugated to beads; removingunbound nucleic acids and nucleic acid probes by washing excess nucleicacids and probes; isothermally amplifying the target DNAs using randomprimers and DNA polymerase, forming a plurality of amplicons;hybridizing RNA probes complementary to a portion of the target DNAs(i.e., SSPs), forming DNA:RNA hybrids; hybridizing specific DNAoligonucleotides to a different portion of the target DNAs, wherein theDNA oligonucleotides are conjugated to selectable beads; and detectingthe plurality of target DNAs by binding detectably labeled DNA:RNAhybrid-specific antibodies to the DNA:RNA hybrids and selecting targetDNA using selectable oligonucleotide-conjugated beads (i.e., CSPs),wherein the DNA:RNA hybrid-specific antibodies are detectably anddistinguishably labeled. The presence of each target is detected by thelabeled DNA:RNA antibody through SSPs which form DNA:RNA hybrids withthe target, while the various targets are separated or selected based onthe oligonucleotide-conjugated bead (i.e., CSP). The presence ofamplicon and DNA:RNA hybrids is indicative of the presence of the targetDNAs.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,051,606, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods for making and using multicomponent Nucleic Acid Enzymes(MNAzymes). Additionally or alternatively, detection of a geneticbiomarker can include a composition comprising at least two or moreoligonucleotide components wherein at least a first oligonucleotidecomponent and a second oligonucleotide component self-assemble in thepresence of an MNAzyme assembly facilitator to form a catalyticallyactive multi-component nucleic acid enzyme (MNAzyme), wherein each ofsaid at least first and said second oligonucleotide components comprisea substrate arm portion, a catalytic core portion, and a sensor armportion; wherein upon self-assembly, the sensor arm portion of saidfirst and second oligonucleotide components act as sensor arms of theMNAzyme, the substrate arm portion of the first and secondoligonucleotide components act as substrate arms of the MNAzyme, and thecatalytic core portion of the first and second oligonucleotidecomponents act as a catalytic core of the MNAzyme; and wherein thesensor arms of the MNAzyme interact with said MNAzyme assemblyfacilitator so as to maintain the first and second oligonucleotidecomponents in proximity for association of their respective catalyticcore portions to form the catalytic core of the MNAzyme, said catalyticcore capable of modifying at least one substrate, and wherein saidsubstrate arms of said MNAzyme engage a substrate so that said catalyticcore of said MNAzyme can modify said substrate. In some embodiments, thecomposition may further comprise at least a third oligonucleotidecomponent which acts to stabilize at least one of said substrate armportions or sensor arm portions. In some embodiments, the method mayfurther comprise at least a third oligonucleotide component and a fourtholigonucleotide component that self-assemble in the presence of at leastone additional assembly facilitator to form at least one additionalcatalytically active MNAzyme, wherein each of said at least third andfourth oligonucleotide components comprise a substrate arm portion, acatalytic core portion, and a sensor arm portion; wherein uponself-assembly of said at least a third oligonucleotide component and afourth oligonucleotide component, the sensor arm portion of said atleast third and said at least fourth oligonucleotide components formsensor arms of said at least one additional catalytically activeMNAzyme, the substrate arm portion of said at least third and said atleast fourth oligonucleotide components form substrate arms of said atleast one additional catalytically active MNAzyme, and the catalyticcore portion of said at least third and said at least fourtholigonucleotide components form a catalytic core of said at least oneadditional catalytically active MNAzyme; and wherein the sensor arms ofsaid at least one additional MNAzyme interact with said at least oneadditional assembly facilitator so as to maintain said at least thirdand said at least fourth oligonucleotide components in proximity forassociation of their respective catalytic core portions to form thecatalytic core of said at least one additional MNAzyme, said catalyticcore capable of acting on at least one additional substrate, and whereinthe substrate arms of said at least one additional MNAzyme engage atleast one additional substrate so that the catalytic core of said atleast one additional MNAzyme can act on said at least one additionalsubstrate. Additionally or alternatively, detection of a geneticbiomarker can include a method for detecting the presence of at leastone assembly facilitator comprising: (a) providing two or moreoligonucleotide components, wherein at least a first oligonucleotidecomponent and a second oligonucleotide component self-assemble in thepresence of an assembly facilitator to form at least one catalyticallyactive multi-component nucleic acid enzyme (MNAzyme); (b) contacting thetwo or more oligonucleotide components with a sample putativelycontaining the assembly facilitator under conditions permitting: (1) theself-assembly of said at least one catalytically active MNAzyme, and (2)the catalytic activity of said MNAzyme; and (c) determining the presenceof the catalytic activity of said at least one MNAzyme, wherein thepresence of the catalytic activity is indicative of the presence of saidat least one assembly facilitator. In some embodiments, the method mayfurther comprise a step of amplifying the assembly facilitator. The stepof amplifying may comprise one or more of: polymerase chain reaction(PCR), strand displacement amplification (SDA), loop-mediated isothermalamplification (LAMP), rolling circle amplification (RCA),transcription-mediated amplification (TMA), self-sustained sequencereplication (3 SR), nucleic acid sequence based amplification (NASBA),or reverse transcription polymerase chain reaction (RT-PCR).Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting the presence of at least one assemblyfacilitator comprising: (a) providing two or more oligonucleotidecomponents, wherein at least a first oligonucleotide component and asecond oligonucleotide component self-assemble in the presence of atleast a first assembly facilitator to form at least a firstcatalytically active multi-component nucleic acid enzyme (MNAzyme); (b)providing at least a first substrate, said first substrate capable ofbeing modified by said first MNAzyme, wherein said modification of saidsubstrate by said MNAzyme provides a detectable effect; (c) contactingsaid two or more oligonucleotide components with a sample putativelycontaining said at least first assembly facilitator under conditionspermitting: (1) the self-assembly of said at least first MNAzyme, and(2) the catalytic activity of said at least first MNAzyme; and (d)detecting said detectable effect. In some embodiments, the method mayfurther comprise the step of amplifying the nucleic acid. The step ofamplifying may comprise one or more of: polymerase chain reaction (PCR),strand displacement amplification (SDA), loop-mediated isothermalamplification (LAMP), rolling circle amplification (RCA),transcription-mediated amplification (TMA), self-sustained sequencereplication (3 SR), nucleic acid sequence based amplification (NASBA),or reverse transcription polymerase chain reaction (RT-PCR).Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting the presence of at least one targetcomprising: (a) providing two or more oligonucleotide components whereinat least a first oligonucleotide component and at least a secondoligonucleotide component are capable of self-assembly in the presenceof said target to form a catalytically active multi-component nucleicacid enzyme (MNAzyme); and wherein at least one of said first and saidsecond oligonucleotide components further comprises at least one aptamerportion; (b) contacting said oligonucleotide components with a sampleputatively containing said at least one target under conditionspermitting: (1) binding of said target to said aptamer portions and (2)catalytic activity of the MNAzyme; and (c) determining the presence ofthe catalytic activity of the MNAzyme, wherein the presence of thecatalytic activity is indicative of the presence of said target.Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting a target using an MNAzyme mediated signalamplification cascade comprising: (a) providing a first oligonucleotidecomponent and a second oligonucleotide component that self assemble inthe presence of said target to form a first catalytically activemulti-component nucleic acid enzyme (MNAzyme); (b) providing aninsoluble support having a first and a second substrate attachedthereto, said first and second substrates are capable of being modifiedby said first MNAzyme, wherein said first and second substrates compriseat least a third and a fourth oligonucleotide component respectively,capable of forming a second catalytically active MNAzyme, wherein saidthird and fourth oligonucleotide components are released uponmodification of said first and second substrates by said first MNAzyme;(c) providing said insoluble support having a third and a fourthsubstrate attached thereto, said third and fourth substrates are capableof being modified by said second MNAzyme, wherein said third and fourthsubstrates comprise at least a fifth and a sixth oligonucleotidecomponent respectively, capable of forming a third catalytically activeMNAzyme, wherein said fifth and said sixth oligonucleotide componentsare released upon modification of said third and fourth substrates bysaid second MNAzyme, and; (d) providing an assembly facilitator capableof facilitating the assembly of said second and said third MNAzyme, and;(e) providing a fifth substrate which is capable of being modified bysaid second MNAzyme to provide a detectable effect; (f) contacting saidfirst and second oligonucleotide components with a sample putativelycontaining said target, in the presence of said assembly facilitator,and in the presence of said insoluble support having said first, second,third and fourth substrates attached thereto under conditionspermitting: (1) self-assembly of said first, second and third, MNAzymes,and (2) catalytic activity of said first, second and third, MNAzymes;and (g) wherein said third MNAzyme modifies said first and secondsubstrates thereby further providing said second MNAzyme wherein saidsecond MNAzyme further modifies at least one of said third, fourth andfifth substrates thereby further providing said third MNAzyme therebyfurther providing said detectable effect, and; (h) wherein detection ofsaid detectable effect is indicative of the presence of said target.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,012,149, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method for synthesis of a cDNA that contains the sequence of an miRNAor other small RNA that can be amplified using standard nucleic acidamplification methods such as the Polymerase Chain Reaction. The methodcan provide higher specificity of cDNA synthesis from small RNAs, whilesimultaneously permitting experimenters to carry out the two keyenzymatic reactions necessary for this synthesis under substantially thesame reaction conditions, conditions that include the presence ofdivalent cations at concentrations from 10 millimolar and 80 millimolar.When these reactions conditions are used as part of an assay for a smallRNA, especially for an miRNA, greater specificity and sensitivityresults. In some embodiments, the method for preparing a cDNA copy of asmall RNA molecule, comprises: (a) providing a small RNA from abiological sample, wherein said RNA is from 18 to 28 nucleotides inlength; (b) incubating the small RNA with an enzyme capable ofcatalyzing the addition of nucleotides at the 3′ end of the small RNA inthe presence of a single ribonucleotide triphosphate selected from thegroup consisting of ATP, GTP, UTP, and CTP and at a final concentrationof divalent magnesium cation between 20 millimolar and 80 millimolar ina reaction to add nucleotides to the small RNA to generate a tailedsmall RNA; (c) annealing a DNA primer to the tailed small RNA wherebythe DNA template extends from the 3′ end of the tailed small RNA,thereby providing a single stranded region of DNA that may be used todirect polymerization of deoxyribonucleotide triphosphates; and (d)incubating the annealed tailed small RNA and DNA primer in the presenceof reverse transcriptase and deoxyribonucleotide triphosphates and at afinal concentration of divalent magnesium cation between 20 millimolarand 80 millimolar under conditions allowing reverse transcription intocDNA and amplification of the annealed tailed small RNA to produce anamplification product.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,962,250, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods of nucleic acid amplification and quantification such as amethod of amplifying a plurality of selected nucleic acid molecules froma pool of nucleic acid molecules comprising: (a) amplifying a pluralityof selected nucleic acid molecules in a first round multiplexamplification reaction including a plurality of outer primer pairs eachpair being specific for a selected nucleic acid sequence wherein theamplification reaction is allowed to proceed to a point prior to that atwhich significant competition between amplicons for reaction componentshas occurred; and (b) further amplifying the selected nucleic acidmolecules in a plurality of second round amplification reactions, eachincluding a portion of the completed multiplex reaction as a templateand at least one pair of inner primers each pair being specific for oneof the selected nucleic acid sequences such that each second roundreaction further amplifies a subset of the plurality of selected nucleicacid molecules respectively. Additionally or alternatively, detection ofa genetic biomarker can include a method of amplifying a plurality ofselected nucleic acid molecules from a pool of nucleic acid moleculescomprising: (a) amplifying a plurality of selected nucleic acidmolecules in a first round multiplex amplification reaction including aplurality of outer primer pairs each pair being specific for a selectednucleic acid sequence wherein the amplification reaction is allowed toproceed to a point prior to that at which significant competitionbetween amplicons for reaction components has occurred; and (b) furtheramplifying the selected nucleic acid molecules in a plurality of secondround amplification reactions, each including a portion of the completedmultiplex reaction as a template and at least one pair of primers eachpair comprising an inner primer and one of the outer primers and beingspecific for one of the selected nucleic acid sequences such that eachsecond round reaction further amplifies a subset of the plurality ofselected nucleic acid molecules respectively. Additionally oralternatively, detection of a genetic biomarker can include a method ofestimating the number of selected nucleic acid molecules from a pool ofnucleic acid molecules comprising: (a) amplifying a plurality ofselected nucleic acid molecules in a first round multiplex amplificationreaction including a plurality of outer primer pairs each pair beingspecific for a selected nucleic acid sequence wherein the amplificationreaction is allowed to proceed to a point prior to that at whichsignificant competition between amplicons for reaction components hasoccurred; (b) further amplifying the selected nucleic acid molecules ina plurality of second round amplification reactions, each including adetectible reporter, a portion of the completed multiplex reaction as atemplate and at least one pair of inner primers each pair being specificfor one of the selected nucleic acid sequences whereby each second roundreaction further amplifies a subset of the plurality of selected nucleicacid molecules respectively; and (c) monitoring each second roundamplification reaction by means of the detectible reporter such that thenumber of selected nucleic acid molecules of each selected sequence isestimated. Additionally or alternatively, detection of a geneticbiomarker can include a method of estimating the number of selectednucleic acid molecules from a pool of nucleic acid molecules comprising:(a) amplifying a plurality of selected nucleic acid molecules in a firstround multiplex amplification reaction including a plurality of outerprimer pairs each pair being specific for a selected nucleic acidsequence wherein the amplification reaction is allowed to proceed to apoint prior to that at which significant competition between ampliconsfor reaction components has occurred; and (b) further amplifying theselected nucleic acid molecules in a plurality of second roundamplification reactions, each including a detectible reporter, a portionof the completed multiplex reaction as a template and at least one pairof primers each pair comprising an inner primer and one of the outerprimers and being specific for one of the selected nucleic acidsequences such that each second round reaction further amplifies asubset of the plurality of selected nucleic acid molecules respectively;and (c) monitoring each second round amplification reaction by means ofthe detectible reporter such that the number of selected nucleic acidmolecules of each selected sequence is estimated. In some embodiments,the fully nested form of the Multiplex Tandem-Polymerase Chain Reaction(MT-PCR) method is used according to the first and third aspects,whereby each selected nucleic acid molecule is amplified using a pair ofouter primers in the first round of amplification and two inner primersin the second round of amplification. In some embodiments, thehemi-nested MT-PCR method is used according to the second and fourthaspects, whereby each selected nucleic acid molecule is amplified usinga pair of outer primers in the first round of amplification and theselected nucleic acid sequence is amplified further in the second roundof amplification using a pair of primers comprising one of the outerprimers used in the first round of amplification paired with one innerprimer. In some embodiments, the second round amplification reactionincludes a plurality of primer pairs and a plurality of fluorescentprobes such that a plurality of selected nucleic acid molecules of eachselected sequence are amplified and quantified by means of thefluorescent probes each being specific for a selected nucleic acidsequence. In some embodiments, at least one of the outer primersincludes UTP nucleotides whereby the primer is amenable to digestion bya UNG enzyme and the outer primers are removed at the end of the firstround of amplification by digestion with a UNG enzyme therebysubstantially preventing contamination of the second round amplificationreaction by the first round primers. In some embodiments, the methodsare used in a method of detecting polymorphisms, mutations, insertionsand deletions. Additionally or alternatively, detection of a geneticbiomarker can include a method of identifying and/or quantifying atleast one selected nucleic acid sequence including the steps of: (i)mixing one or more selected nucleic acid sequences with one or moredetectible reporters; (ii) generating a melting curve by measuring thesignal generated by said one or more detectible reporters; (iii)identifying and/or quantifying said one or more selected nucleic acidsequences from said melting curve.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,877,436, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method for determining the presence of a target nucleic acid moleculein a sample containing biological material. In some embodiments, themethod for determining the presence of a target nucleic acid molecule ina sample comprises: a) suspending the sample in a collection medium; b)releasing target nucleic acid molecules from the sample into thecollection medium; c) converting double-stranded target nucleic acidmolecules to single-stranded target nucleic acid molecules; d)contacting one or more probes with the single-stranded target nucleicacid molecules under conditions that allow the probes and targetsingle-stranded target nucleic acid molecules to hybridize formingdouble-stranded nucleic acid hybrids; e) capturing the double-strandednucleic acid hybrids; f) separating the double-stranded nucleic acidhybrids from un-bound single-stranded target nucleic acid molecules; andg) detecting the double-stranded nucleic acid hybrids, therebyindicating the presence of the target nucleic acid. In some embodiments,the detection method may be automated, either fully automated, orpartially automated—in other words requiring some human input. In someembodiments, the detection of target nucleic acid molecules in multiplesamples at the same time or within a very short period of time, forexample in a machine or a series of machines. Additionally oralternatively, detection of a genetic biomarker can include a collectionmedium into which a sample containing a target nucleic acid molecule arecollected. The target nucleic acid molecule can be kept in thecollection medium with minimal degradation of the target nucleic acidmolecule over a time period of weeks or months. In some embodiments,DNA-based target sample material can be kept in the collection mediumwith minimal degradation of the target nucleic acid molecule over a timeperiod of weeks or months. In some embodiments, the detergent-basedcollection medium allows for the rapid analysis and processing of asample. In some embodiments, the collection medium comprises about 0.5%to about 2.0% NP-40, about 0.10% to about 0.40% sodium deoxycholate,about 25 mM to about 75 mM Tris-HCl, about 10 mM to about 50 mM EDTA,about 50 mM to about 200 mM NaCl, and about 0.01% to about 0.10% sodiumazide.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,372,637, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of stabilizing a biological sample that has the followingsteps:

a) preparation of a biological sample, and b) contacting the biologicalsample with a composition having a substance according to the followingstructural formula:

in which R1 is a hydrogen residue or a methyl residue, R2 and R3 areidentical or different hydrocarbon residues with a length of the carbonchain of 1-20, and R4 is an oxygen, sulfur or selenium residue. Thehydrocarbon residues R2 and/or R3 can be selected independently of oneanother from the group comprising alkyl, long-chain alkyl, alkenyl,alkoxy, long-chain alkoxy, cycloalkyl, aryl, haloalkyl, alkylsilyl,alkylsilyloxy, alkylene, alkenediyl, arylene, carboxylates and carbonyl.In some embodiments, the chain length n on R2 and/or R3 can have thevalues 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19and 20. In some embodiments, R2 and R3 have lengths of the carbon chainof 1-10. In some embodiments, R2 and R3 have lengths of the carbon chainof 1-5. In this case the chain length n can in particular have thevalues 1, 2, 3, 4 and 5. In some embodiments, methyl and ethyl residuesare used on R2 and/or R3. The chain length n then has values of 1 and 2.In some embodiments, the substance used in the composition can be usedas the sole agent for preservation. In some embodiments, the substancecan also be used in conjunction with other preserving substances or evenonly as an additive to other preserving substances in the composition.In some embodiments, the volume ratio or weight ratio between thesubstance and one or more other preserving substances in the compositioncan be in the range from 0.01:100 to 100:0. Preferably it is in therange from 0.1:100 to 100:0 and especially preferably it is in the rangefrom 1:100 to 100:0 and particularly preferably it is in the range from5:100 to 100:0. In some embodiments, the class of substances used is forexample dialkylacetamides (if R1 is a methyl residue) ordialkylformamides (if R1 is a hydrogen residue). In some embodiments,the biological sample is a material selected from the group comprisingsample material, plasma, body fluids, blood, serum, cells, leukocytefractions, crusta phlogistica, sputum, saliva, urine, semen, feces,forensic samples, smears, aspirates, biopsies, tissue samples, tissueparts and organs, food samples, environmental samples, plants and plantparts, bacteria, viruses, viroids, prions, yeasts and fungi, andfragments or constituents of the aforementioned materials, and/orisolated, synthetic or modified proteins, nucleic acids, lipids,carbohydrates, metabolic products and/or metabolites. In someembodiments, the substance is a substance selected from the groupcomprising N,N-dimethylacetamide, N,N-diethylacetamide,N,N-dimethylformamide, N,N-diethylformamide, N,N-dimethylthioformamideand N,N-diethylthioformamide. Their structural formulas are as follows:

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,043,834, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includecompositions and methods useful for labeling and detection of analytes.In some embodiments, the compositions are associations of threecomponents: reporter binding agents, amplification target circles, andDNA polymerase. The compositions are assembled prior to their use in arolling circle amplification reaction and can be stored and transportedprior to use without substantial loss of activity. In some embodiments,the composition comprises a reporter binding agent, an amplificationtarget circle, and DNA polymerase, wherein the reporter binding agentcomprises a specific binding molecule and a rolling circle replicationprimer, wherein the specific binding molecule is specific for a targetmolecule, wherein the specific binding molecule is not bound to thetarget molecule, wherein the composition does not comprise tandemsequence DNA, wherein the reporter binding agent, the amplificationtarget circle, and the DNA polymerase are directly associated with eachother and form a complex, and wherein the composition is not in an assayor reaction. In some embodiments, the composition can be used asuniversal reagents of rolling circle amplification. For this purpose,the compositions can have specific binding molecules that can interactwith particular moieties or molecules that are present on, or are usedto label any target molecule of interest. For example, the specificbinding molecule in the reagent composition can be streptavidin oranother biotin-specific molecule (such as an anti-biotin antibody). Anytarget molecule labeled with biotin can then be associated with thereagent composition and labeled and/or detected via rolling circleamplification mediated by the composition. This reagent composition canbe used with any biotinylated target molecule. Similarly, use of anantibody specific to a class of antibodies (for example, and anti-mouseantibody) as the specific binding molecule in reagent compositions. Suchreagent compositions can be used to label and detect a class ofantibodies in an assay. For example, the reagent composition can be usedto label and detect all mouse antibodies bound to antigen inimmunoassays regardless of the specificity of the individual mouseantibodies. This is analogous to the use of antibodies specific to aclass of antibodies in sandwich immunoassays. The reagents compositionsprovide greater signal amplification and tighter localization of thesignal than in traditional immunoassays.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,682,790, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includecompositions for the isolation and/or stabilisation of nucleic acids inmaterials of biological origin. The compositions contain as an essentialingredient a cationic compound of general formula

Y+R1R2R3R4X−

wherein Y may denote nitrogen or phosphorus; R1, R2, R3 and R4independently of one another may denote a branched or unbranchedC1-C20-alkyl group and/or a C6-C20-aryl group as well as aC6-C26-aralkyl group; and X—may represent an anion of an inorganic ororganic, mono- or polybasic acid and at least one proton donor asadditive. In some embodiments, the cationic compounds consist of anammonium salt wherein R1 denotes a higher alkyl group, preferably with12, 14 or 16 carbon atoms, and R2, R3 and R4 in each case denote amethyl group. In some embodiments, R1 denotes an aralkyl group,preferably a benzyl group, R2 denotes a higher alkyl group—preferablywith 12, 14 or 16 carbon atoms—and R3 and R4 denote a methyl group. Insome embodiments, the anion is bromide, chloride, phosphate, sulphate,formate, acetate, propionate, oxalate or succinate. In some embodiments,aliphatic hydroxy-di- and -tricarboxylic acids, e.g., tartronic acid,D-(+), L-(−) or DL-malic acid, (2R, 3R)-(+)-tartaric acid,(2S,3S)-(−)-tartaric acid, meso-tartaric acid and citric acid, are used.In some embodiments, aliphatic ketodicarboxylic acids may also be usedas additives, such as e.g. mesoxalic acid and oxaloacetic acid, of whichoxaloacetic acid is most particularly preferred. In some embodiments,amino acids may be used, of which α-amino acids—such as e.g. aminoaceticacid (glycine), α-aminopropionic acid (alanine), α-amino-iso-valericacid (valine), α-amino-iso-caproic acid (leucine) andα-amino-β-methylvaleric acid (isoleucine) are preferred. As furtheradditives, mineral acids and their salts may also be used. Preferably,the salts of mineral acids—such as phosphoric acid or sulphuricacid—with alkali metals or the ammonium salts thereof are used.Phosphoric acid and ammonium sulphate are most preferably used.Additionally or alternatively, detection of a genetic biomarker caninclude a method of stabilizing nucleic acids in a biological sample,the method comprising: mixing a storage stabilization composition with asolution containing the nucleic acids, wherein the composition comprisesa cationic compound of the general formula

Y+R1R2R3R4X−

wherein Y represents nitrogen or phosphorus; R1, R2, R3 and R4,independently, represent a branched or unbranched C1-C20-alkyl groupand/or a C6-C20-aryl group as well as a C6-C26-aralkyl group; X−represents an anion of an inorganic or organic, mono- or polybasic acid;andat least one proton donor; wherein the proton donor is present in thecomposition in a concentration of above 50 mM to saturation and whereinthe proton donor is selected from the group consisting of saturatedaliphatic monocarboxylic acids, unsaturated alkenyl-carboxylic acids,saturated and/or unsaturated aliphatic C2-C6-dicarboxylic acids,aliphatic hydroxyl-di- and tricarboxylic acids, aliphatic ketocarboxylicacids, amino acids or the inorganic acids or the salts thereof, on theirown or in combination; stabilizing the nucleic acids, wherein thenucleic acids are stabilized by forming an ionic complex with thecationic compound; optionally separating the insoluble ionic complexfrom the solution; and optionally releasing the nucleic acids from theinsoluble ionic complex.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,683,035, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can include(1) a method of stabilizing and/or isolating nucleic acids from abiological sample, comprising the following step: contacting thebiological sample with at least one cationic compound of formula (I):

wherein conjugated bases of strong and/or weak inorganic and/or organicacids are used as anion (A), and wherein the substance consisting of (I)and the anion is neutral in charge on the whole, and wherein Xrepresents nitrogen (N) or phosphorus (P), k represents the integer 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,22, 23, or 24, Bk represents aliphatic alkanediyl bridges, which may besubstituted on none, on one or more carbon atoms, and wherein one ormore non-adjacent carbon atoms may be replaced by oxygen, and which havethe structure

—(CH2)n-(OCH2)m-

wherein n and m independently represent the integer 0, 1, 2, 3, 4, 5, or6, with n+m>0; alternatively, Bk represents a substituted phenyl,naphthyl or biphenyl bridge, which, in addition, may be substituted onone or more carbon atoms and has the structure

wherein n, m, 1, p, q independently represent the integer 0, 1, 2, 3, 4,5, or 6; R1, R2, R3k, which may be identical or different and which maybe unsubstituted or substituted on one or more carbon atoms, representhydrogen, linear or branched C1-C6 alkyl, linear or branched C1-C6alkenyl, linear or branched C1-C6 alkynyl, phenyl, benzyl, andphenoxyethyl having the structure

wherein n, m independently represent the integer 0, 1, 2, 3, 4, 5, or 6,and Z represents one of the structures —O—, —CO—, —CO2-, —OCO—, —CO—N—,—N—CO—, —O—CO—N—, —N—CO—O—, —S—, or —S—S—; or R1, R2, R3k representphenyl, benzyl, phenoxyethyl having the structure

wherein n, m independently represent the integer 0, 1, 2, 3, 4, 5, or 6;RA, RBk, RC, which may be identical or different and which may beunsubstituted or substituted on one or more carbon atoms, representhydrogen, linear or branched C1-C21 alkyl, linear or branched C1-C21alkenyl, linear or branched C1-C21 alkynyl, and a structure

CH3-(CH2)n-Z—(CH2)m-

wherein n, m independently represent the integer 0, 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, or 24, andZ represents —O—, —CO—, —CO2-, OCO—, —CO—N—, —N—CO—, —O—CO—N—, —N—CO—O—,—S—, or —S—S—; alternatively, RA and RC together form a residue RAChaving a cyclic structure

wherein the residue RAC, which may be unsubstituted or substituted onone or more carbon atoms, represents linear or branched C1-C8 alkyl,linear or branched C1-C8 alkenyl, or linear or branched C1-C8 alkynyl,and if k>1, the bridging groups Bk and the groups RBk and R3k are thesame or different; (2) A kit for stabilizing and/or isolating nucleicacids, comprising at least one cationic compound as defined above byformula (I); (3) A complex, comprising a nucleic acid and at least onecationic compound, formed as the result of the method in (1); (4) Acomposition of matter, comprising at least one cationic compound asdefined above by formula (I); (5) A pharmaceutical composition,comprising the composition of matter in (4); (6) A diagnosticcomposition, comprising the composition of matter in (4); and (7) Acomposition for research, comprising the composition of matter in (4).

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,323,310, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includean amplification reaction mixture for selectively amplifying RNA from atarget RNA template comprising: a cellular RNA-dependent RNA-polymeraseand at least two primers complementary to said target RNA template. Insome embodiments, the RNA-dependent RNA-polymerase (RdRp) is a tomato orother cellular RdRp. In some embodiments, the RdRp is a cellular RdRp,and in a particularly preferred embodiment, the RdRp is selected fromthe group consisting of tomato RdRp, Tobacco RdRp, cucumber RdRp, andwheat RdRp. In some embodiments, the amplification reaction mixturecomprises a cellular RdRp and at least two primers complementary to atarget RNA template. In a preferred embodiment, the reaction mixturesfurther comprise an RNA helicase, an energy source and optionally adivalent cation such as, e.g., Mg2+, Mn2+ or Co2+. The amplificationreaction mixtures may further include RNase inhibitors, RNA stabilizingagents, single-stranded binding proteins, rNTPs and analogs of rNTPs,for the amplification of target RNA into product RNA. An amplificationbuffer is also provided which is supportive of both RdRp and RNAhelicase activity. Additionally or alternatively, detection of a geneticbiomarker can include a method for selectively amplifying RNA from anRNA template, comprising: contacting the template RNA with anamplification reaction mixture according to claim 1; and incubating saidamplification reaction mixture to produce amplified RNA product, whereinsaid incubation step comprises at least one denaturation condition.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 6,977,153, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method involving synthesizing first strand cDNA molecules from RNAmolecules, circularizing the first strand cDNA molecules and replicatingthe circularized first strand cDNA molecules using rolling circlereplication. Additionally or alternatively, detection of a geneticbiomarker can include a method of amplifying RNA sequences, the methodcomprising: incubating a cDNA primer and an RNA sample comprising RNAmolecules under conditions that promote synthesis of first strand cDNAmolecules from the RNA molecules; incubating a circularization probe andthe first strand cDNA molecules under conditions that promotecircularization of the first strand cDNA molecules; and incubating thecircularized first strand cDNA molecules under conditions that promoterolling circle replication of the circularized first strand cDNAmolecules, thereby amplifying RNA sequences. Additionally oralternatively, detection of a genetic biomarker can include a method ofamplifying RNA sequences, the method comprising: incubating a cDNAprimer and an RNA sample comprising RNA molecules under conditions thatpromote synthesis of first strand cDNA molecules from the RNA molecules,wherein the conditions that promote synthesis of first strand cDNAmolecules comprise incubating the cDNA primer and the RNA sample in thepresence of a reverse transcriptase; incubating the first strand cDNAmolecules in the presence of an RNAse H activity; incubating the firststrand cDNA molecules under alkaline conditions; neutralizing the firststrand cDNA molecules; purifying the first strand cDNA molecules;incubating a circularization probe and the first strand cDNA moleculesunder conditions that promote circularization of the first strand cDNAmolecules, wherein the conditions that promote circularization of thefirst strand cDNA molecules comprise incubating the circularizationprobe and the first strand cDNA molecules in the presence of ligase;incubating the circularized first strand cDNA molecules under conditionsthat promote rolling circle replication of the circularized first strandcDNA molecules, thereby amplifying RNA sequences, wherein the conditionsthat promote rolling circle replication of the circularized first cDNAmolecules comprise incubating the circularized first strand cDNAmolecules in the presence of a DNA polymerase. Additionally oralternatively, detection of a genetic biomarker can include a method ofamplifying RNA sequences, the method comprising: incubating a cDNAprimer and an RNA sample comprising RNA molecules under conditions thatpromote synthesis of first strand cDNA molecules from the RNA molecules;incubating a circularization probe and the first strand cDNA moleculesunder conditions that promote ligation of the first strand cDNAmolecules to each other to form first strand cDNA concatemers; andincubating the first strand cDNA concatemers under conditions thatpromote strand displacement replication of the first strand cDNAconcatemers, thereby amplifying RNA sequences.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 6,815,212, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods provided for detecting the binding of a first member to a secondmember of a ligand pair, comprising the steps of (a) combining a set offirst tagged members with a biological sample which may contain one ormore second members, under conditions, and for a time sufficient topermit binding of a first member to a second member, wherein said tag iscorrelative with a particular first member and detectable bynon-fluorescent spectrometry, or potentiometry, (b) separating boundfirst and second members from unbound members, (c) cleaving the tag fromthe tagged first member, and (d) detecting the tag by non-fluorescentspectrometry, or potentiometry, and therefrom detecting the binding ofthe first member to the second member. A wide variety of first andsecond member pairs may be utilized, including for example, nucleic acidmolecules (e.g., DNA, RNA, nucleic acid analogues such as PNA, or anycombination of these), proteins or polypeptides (e.g., an antibody orantibody fragment (e.g., monoclonal antibody, polyclonal antibody, or abinding partner such as a CDR), oligosaccharides, hormones, organicmolecules and other substrates (e.g., xenobiotics such asglucuronidase—drug molecule), or any other ligand pair. In someembodiments, the first and second members may be the same type ofmolecule or of different types. For example, representative first membersecond member ligand pairs include: nucleic acid molecule/nucleic acidmolecule; antibody/nucleic acid molecule; antibody/hormone;antibody/xenobiotic; and antibody/protein. Additionally oralternatively, detection of a genetic biomarker can include methods foranalyzing the pattern of gene expression from a selected biologicalsample, comprising the steps of (a) exposing nucleic acids from abiological sample, (b) combining the exposed nucleic acids with one ormore selected tagged nucleic acid probes, under conditions and for atime sufficient for said probes to hybridize to said nucleic acids,wherein the tag is correlative with a particular nucleic acid probe anddetectable by non-fluorescent spectrometry, or potentiometry, (c)separating hybridized probes from unhybridized probes, (d) cleaving thetag from the tagged fragment, and (e) detecting the tag bynon-fluorescent spectrometry, or potentiometry, and therefromdetermining the patter of gene expression of the biological sample.Within one embodiment, the biological sample may be stimulated with aselected molecule prior to the step of exposing the nucleic acids.Representative examples of “stimulants” include nucleic acid molecules,recombinant gene delivery vehicles, organic molecules, hormones,proteins, inflammatory factors, cytokines, drugs, drug candidates,paracrine and autocrine factors, and the like. Within furtherembodiments, the tag(s) may be detected by fluorometry, massspectrometry, infrared spectrometry, ultraviolet spectrometry, or,potentiostatic amperometry (e.g., utilizing coulometric or amperometricdetectors). Representative examples of suitable spectrometric techniquesinclude time-of-flight mass spectrometry, quadrupole mass spectrometry,magnetic sector mass spectrometry and electric sector mass spectrometry.Specific embodiments of such techniques include ion-trap massspectrometry, electrospray ionization mass spectrometry, ion-spray massspectrometry, liquid ionization mass spectrometry, atmospheric pressureionization mass spectrometry, electron ionization mass spectrometry,fast atom bombard ionization mass spectrometry, MALDI mass spectrometry,photo-ionization time-of-flight mass spectrometry, laser droplet massspectrometry, MALDI-TOF mass spectrometry, APCI mass spectrometry,nano-spray mass spectrometry, nebulised spray ionization massspectrometry, chemical ionization mass spectrometry, resonanceionization mass spectrometry, secondary ionization mass spectrometry andthermospray mass spectrometry.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 6,361,940, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includecompositions and methods to increase the specificity of hybridization ofnucleic acids and priming of nucleic acids in PCR. In some embodiments,the composition comprises a nucleic acid and a salt, the salt comprisingan anion and a cation, the anion selected from halogenated acetate,propionate and halogenated propionate, the cation selected from primary,secondary and tertiary ammonium comprising 1-36 carbon atoms, andquaternary ammonium comprising 4-48 carbon atoms. In some embodiments,the composition is non-flowing and comprises an oligonucleotide of 6-100nucleotides and a salt, the salt comprising an anion and a cation, theanion selected from acetate, halogenated acetate, propionate, andhalogenated propionate, the cation selected from primary, secondary andtertiary ammonium comprising 1-36 carbon atoms, and quaternary ammoniumcomprising 4-48 carbon atoms. In some embodiments, the composition isfree from organic solvent and comprises an oligonucleotide of 6-100nucleotides and a salt, the salt comprising an anion and a cation, theanion selected from acetate, halogenated acetate, propionate, andhalogenated propionate, the cation selected from primary, secondary andtertiary ammonium comprising 1-36 carbon atoms, and quaternary ammoniumcomprising 4-48 carbon atoms. In some embodiments, the compositioncomprises a nucleic acid and a salt, the nucleic acid immobilized on asolid support, the salt comprising an anion and a cation, the anionselected from acetate, halogenated acetate, propionate and halogenatedpropionate, the cation selected from primary, secondary and tertiaryammonium comprising 1-36 carbons, and quaternary ammonium comprising4-48 carbons. Additionally or alternatively, detection of a geneticbiomarker can include a salt selected from the group: (a) an acetatesalt of a cation of the formula HN(CH3)2Ra wherein Ra is a C4-C7hydrocarbyl; (b) a halogenated acetate salt of a cation of the formulaHN(CH3)2Rb wherein Rb is a C7-C12 hydrocarbyl; (c) acetate andhalogenated acetate salts of a cation of the formulaH2N(C5-C7cycloalkyl)Rc where R c is a C 1-C 12 hydrocarbyl; and (d)acetate and halogenated acetate salts of N-substituted piperdine,wherein the nitrogen of piperidine is substituted with C1-C12hydrocarbyl. Additionally or alternatively, detection of a geneticbiomarker can include an oligonucleotide in solution, where theoligonucleotide is formed from constituents including a plurality offragments, each fragment shown schematically by structure (1)

wherein,

represents a sequence of at least three nucleotides as found inwild-type DNA, where “B” represents a base independently selected ateach location; — represents a series of covalent chemical bonds termed a“specificity spacer,” which separates and connects two bases B 3 and B5; the specificity spacer having steric and chemical properties suchthat (a) it does not prevent hybridization between a fragment ofstructure (1) and an oligonucleotide fragment having a complementarybase sequence, as shown schematically as structure (2)

and (b) it cannot enter into hydrogen bonding with a base positionedopposite itself in a hybridized complementary base sequence of structure(2). Additionally or alternatively, detection of a genetic biomarker caninclude an array which includes a plurality of oligonucleotidesimmobilized in an array format to a solid support, each oligonucleotideof the plurality formed from components which include a plurality offragments, each fragment shown schematically by structure (1)

wherein,

represents a sequence of at least three nucleotides as found inwild-type DNA, where “B” represents a base independently selected ateach location; — represents a series of covalent chemical bonds termed a“specificity spacer,” which separates and connects two bases B 3 and B5; the specificity spacer having steric and chemical properties suchthat (a) it does not prevent hybridization between a fragment ofstructure (1) and an oligonucleotide fragment having a complementarybase sequence, as shown schematically as structure (2)

and (b) it cannot enter into hydrogen bonding with a base positionedopposite itself in a hybridized complementary base sequence of structure(2). Additionally or alternatively, detection of a genetic biomarker caninclude a method of distinguishing between hybridization of acomplementary nucleic acid target and a nucleic acid probe in which theprobe and target are perfectly complementary and in which the probe andtarget have one or more base mismatches, comprising: (a) mixing thenucleic acid target with the nucleic acid probe in a solution comprisinga hybotrope; (b) hybridizing at a discriminating temperature; and (c)detecting probe hybridized to target, thereby determining whether thenucleic acid probe and target are perfectly complementary or mismatched.In a preferred embodiment, the nucleic acid probe is labeled with aradioactive molecule, fluorescent molecule, mass-spectrometry tag orenzyme. In preferred embodiments, the nucleic acid probe and/or thetarget nucleic acid is from 6 to 40 bases. Preferably, the hybotrope isan ammonium salt. Specific preferred ammonium salt hybotropes include,without limitation, bis(2-methoxyethyl)amine acetate, 1-ethylpiperidineacetate, 1-ethylpiperidine trichloroacetate, 1-ethylpiperidinetrifluoroacetate, 1-methylimidizole acetate, 1-methylpiperidine acetate,1-methylpiperidine trichloroacetate, 1-methylpyrrolidine acetate,1-methylpyrrolidine trichloroacetate, 1-methylpyrrolidinetrifluoroacetate, 2-methoxyethylamine acetate,N,N-dimethylcyclohexylamine acetate, N,N-dimethylcyclohexylaminetrifluoroacetate, N,N-dimethylcyclohexylamine, N,N-dimethylheptylamineacetate, N,N-dimethylheptylamine acetate, N,N-dimethylhexylamineacetate, N,N-dimethylhexylamine acetate, N,N-dimethylisopropylamineacetate, N-ethylbutylamine acetate, N-ethylbutylamine trifluoroacetate,N,N-dimethylaminobutane trichloroacetate, N,N-dimethylisopropylaminetrichloroacetate, triethanolamine acetate, triethylamine acetate,triethylamine trichloroacetate, tripropylamine acetate, andtetraethylammonium acetate. Other suitable hybotropes include LiTCA,RbTCA, GuSCN, NaSCN, NaClO 4, KI, TMATCA TEATCA, TMATBA, TMTCA, TMTBA,TBATCA and TBATBA. Preferably, the hybotrope is present at a molarity offrom about 0.005 M to about 6 M. Preferably, the probe nucleic acid isDNA or RNA, and the target nucleic acid is DNA or RNA. Preferably, thetarget nucleic acid is affixed to a solid substrate. Preferably, themethod further comprises polymerase chain reaction. Additionally oralternatively, detection of a genetic biomarker can include a method ofdistinguishing between hybridization of a complementary nucleic acidtarget and a nucleic acid probe in which the probe and target areperfectly complementary and in which the probe and target have one ormore base mismatches, comprising: (a) mixing a nucleic acid target witha nucleic acid probe containing at least one abasic or deoxyNebularinesubstitution; (b) hybridizing at a discriminating temperature; and (c)detecting probe bound to the target, thereby determining whether thenucleic acid probe and target are perfectly complementary or mismatched.Additionally or alternatively, detection of a genetic biomarker caninclude a method of increasing discrimination in a nucleic acidsynthesis procedure, comprising: (a) mixing a single-stranded nucleicacid target with an oligonucleotide primer in a solution comprising ahybotrope and a polymerase; (b) annealing the primer to the target at adiscriminating temperature; and (c) synthesizing a complementary strandto the target to form a duplex. Additionally or alternatively, detectionof a genetic biomarker can include a method of distinguishing a singlebase change in a nucleic acid molecule from a wild-type sequence,comprising: (a) mixing a single-stranded nucleic acid target with anoligonucleotide primer in a solution comprising an amine-based salt anda polymerase, wherein the oligonucleotide primer has a 3′-most basecomplementary to the wild-type sequence or the single base change; (b)annealing the primer to the target at a discriminating temperature; (c)extending the primer, wherein a complementary strand to the target issynthesized when the 3′-most base of the primer is complementary to thetarget; and (d) detecting the extension of the primer.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 6,248,521, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods of amplifying nucleic acid molecules from a template comprising(a) mixing single-stranded nucleic acid templates on a solid substratewith a solution comprising an oligonucleotide primer that hybridizes tothe templates and a DNA polymerase, wherein the mixing occurs indiscrete areas on the substrate, and wherein the solution remains in thediscrete areas; (b) synthesizing a complementary strand to the templateto form a duplex; (c) denaturing the duplex; and (d) synthesizingcomplementary strands to the template, therefrom amplifying nucleic acidmolecules; wherein mixing, synthesizing, and denaturing are conducted atdew point. The solid substrate may be a silicon wafer or glass slide.The templates may be covalently attached to the solid substrate ordeposited on the surface of the substrate. The template may be uniformlyapplied to the entire array prior to mixing or applied individually toeach discrete area on the substrate. When applied individually,preferably the applying is performed using spring probes. In a mostpreferred embodiment, an apparatus is used to control the dew point.Additionally or alternatively, detection of a genetic biomarker caninclude a method of performing single nucleotide extension assay isprovided, comprising (a) mixing oligonucleotides on a solid substratewith a solution comprising single-stranded nucleic acid molecules thathybridize to the oligonucleotides, a single nucleotide, and a DNApolymerase, wherein the mixing occurs in discrete areas of thesubstrate, and wherein the solution remains in discrete areas; and (b)detecting an extension product of the oligonucleotide; wherein theoligonucleotide will be extended only when the single nucleotide iscomplementary to the nucleotide adjacent to the hybridizedoligonucleotide, wherein mixing is performed at dew point.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0155705, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method for isolating extracellularnucleic acids from a biological sample, comprising (a) preparing fromthe sample a binding mixture comprising: i) extracellular nucleic acids;ii) particles providing an anion exchange surface; iii) at least onenon-ionic detergent which is a polyoxyalkylene fatty alcohol ether; iv)optionally at least one salt, wherein the binding mixture has a pH sothat extracellular nucleic acids bind to the particles, (b) separatingthe particles with the bound extracellular nucleic acids from theremaining binding mixture; (c) optionally washing the boundextracellular nucleic acids; and (d) optionally eluting boundextracellular nucleic acids. In some embodiments, the binding mixture isprepared by forming a suspension by contacting the particles with alysis and/or binding composition which comprises the at least onepolyoxyalkylene fatty alcohol ether and which optionally comprises asalt and/or a buffer; contacting the suspension with the samplecomprising extracellular nucleic acids; and optionally adding aproteolytic enzyme prior to, at the same time or after the sample wascontacted with the suspension. Additionally or alternatively, detectionof a genetic biomarker can include a kit for performing the method,which comprises (a) a lysis and/or binding composition comprising: i) atleast one non-ionic detergent which is a polyoxyalkylene fatty alcoholether; ii) optionally at least one salt; iii) at least one buffer;wherein said composition has an acidic pH; (b) particles providing ananion exchange surface; (c) optionally a proteolytic enzyme; (d)optionally one or more wash solutions and (e) optionally one or moreelution solutions.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0148716, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods of generating a circulardouble-stranded DNA (dsDNA) or a sequencing library, wherein the methodcomprises circulating a dsDNA or ligating a first and a second dsDNA inthe presence of a DNA ligase and a single-stranded DNA binding proteinor a double-stranded DNA-binding protein. In some embodiments, themethod of generating a sequencing library comprises further steps,preceding the ligation, of: (i) providing DNA fragments; (ii)end-repairing the DNA fragments by a polynucleotide kinase enzyme and anenzyme with polymerase and exonuclease activities; and (iii) optionallyadding a terminal adenine to the end of the end-repaired DNA fragmentsby a deoxynucleotidyl transferase enzyme. In some embodiments, saidmethod further comprises the subsequent steps of purification andsize-selection of the ligated fragments for sequencing. In someembodiments, the adapter-ligated fragments are amplified prior tosequencing. Additionally or alternatively, detection of a geneticbiomarker can include a kit comprising: (i) a DNA ligase; and (ii) asingle-stranded DNA (ssDNA) binding protein or a double-stranded DNA(dsDNA)-binding protein. In some embodiments, the kit comprises: (i) apolynucleotide kinase and an enzyme with polymerase and exonucleaseactivities; (ii) optionally a deoxynucleotidyl transferase; (iii) a DNAligase; (iv) a single-stranded or a double-stranded DNA binding protein;and (v) optionally a reaction buffer. In preferred embodiments, any ofthe kits comprises a mixture of a ligase, a single-stranded DNA (ssDNA)binding protein or a double-stranded DNA (dsDNA) binding protein, andoptionally a reaction buffer. In a preferred embodiment, the enzyme withpolymerase and exonuclease activities is a DNA polymerase. In themethods or kits referenced above, the polynucleotide kinase enzyme isthe T4 Polynucleotide Kinase (PNK), the enzyme with polymerase andexonuclease activities is T4 DNA Polymerase, and/or the deoxynucleotidyltransferase enzyme is a Taq polymerase or a Klenow Fragment exo-. Insome embodiments, ligation methods are referred to, wherein both thefirst and the second dsDNAs comprise two ssDNA ends, whereby each of thessDNA ends of the first dsDNA ligates with each of the complementary ssends of the second dsDNA to provide ligated circular dsDNA. In someembodiments, the first or the second DNA is capable of conferring theability to auto-replicate within competent cells. In the methods or kitsreferenced above, the DNA binding protein is a viral, bacterial,archaeal, or eukaryotic single-stranded DNA binding protein ordouble-stranded DNA binding protein. In some embodiments, the DNA ligasein any of the above methods or kits is a T3 DNA ligase or a T4 DNAligase. In other embodiments, the ligase is a T7 DNA ligase or anAmpligase®. In some embodiments of the above methods, each of the firstand the second dsDNA have one or two single stranded DNA (ssDNA) end(s).This/these ssDNA end(s) is/are less than 20 nucleotides (nt) in length.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0002738, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods for amplifying target nucleicacids in a nucleic acid sample and kits useful in such methods. In someembodiments, the method for amplifying target nucleic acids in a nucleicacid sample, comprises: (a) extending each of a plurality of barcodeprimers (BC primers) to obtain extension products using the targetnucleic acids as templates, wherein (i) each barcode primer comprises,from 5′ to 3′, a 1st universal primer sequence (US1), a molecular tagsequence (MT), and a 1st target-specific sequence (TS1), (ii) aplurality of barcode primers comprise at least 20 different barcodeprimers, and (iii) among the plurality of barcode primers (BC primers),the 1st universal primer sequences (US1) are the same, but the 1sttarget-specific sequences (TS1) are different; (b) separating theplurality of barcode primers that have not been extended in step (a)from the extension products; and (c) amplifying the extension productsof step (b) in the presence of a plurality of limited amplificationprimers (LA primers) to obtain a plurality of 1st amplificationproducts, wherein (i) each limited amplification primer comprises, from5′ to 3′, a 2nd universal primer sequence (US2) and a 2ndtarget-specific sequence (TS2), and (ii) among the plurality of limitedamplification primers, the 2nd universal primer sequences (US2) are thesame, but the 2nd target-specific sequences (TS2) are different.Additionally or alternatively, detection of a genetic biomarker caninclude a kit comprising: (1) a plurality of barcode primers (BCprimers), wherein (i) each barcode primer comprises, from 5′ to 3′, a1st universal primer sequence (US1), a molecular tag sequence (MT), anda 1st target-specific sequence (TS1), (ii) a plurality of barcodeprimers comprise at least 20 different barcode primers, and (iii) amongthe plurality of barcode primers, the 1st universal primer sequence(US1) are the same, the molecular tag sequences (MT) are different, andthe 1st target-specific sequence (TS1) are different; and (2) aplurality of limited amplification primers (LA primers), wherein (i)each limited amplification primer comprises, from 5′ to 3′, a 2nduniversal primer sequence (US2) and a 2nd target-specific sequence(TS2), and (ii) among the plurality of limited amplification primers,the 2nd universal primer sequences (US2) are the same, but the 2ndtarget-specific sequence (TS2) are different.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2016/0374330, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method suitable for stabilizing anextracellular nucleic acid population comprised in a cell-containingbiological sample is provided, comprising contacting the cell-containingsample with at least one poly(oxyethylene) polymer as stabilizing agentor with mono-ethylene glycol as stabilizing agent. Additionally oralternatively, detection of a genetic biomarker can include a method forisolating extracellular nucleic acids from a cell-containing biologicalsample is provided, wherein said method comprises: a) stabilizing thecell-containing biological sample according to the method defined above;and b) isolating extracellular nucleic acids from the stabilized sample.Additionally or alternatively, detection of a genetic biomarker caninclude a composition suitable for stabilizing a cell-containingbiological sample is provided comprising: i) a poly(oxyethylene) polymeras stabilizing agent or ii) mono-ethylene glycol as stabilizing agentand one or more, preferably two or more further additives selected fromthe group consisting of: one or more primary, secondary or tertiaryamides; a caspase inhibitor; an anticoagulant and/or a chelating agent.Preferably, the composition comprises a poly(oxyethylene) polymer, whichpreferably is a high molecular weight poly(oxyethylene) polymer having amolecular weight of at least 1500, as stabilizing agent and furthermorecomprises one or more, preferably two or more further additives selectedfrom the group consisting of at least one further poly(oxyethylene)polymer having a molecular weight that is at least 100, preferably atleast 200, at least 300 or at least 400 below the molecular weight ofthe first poly(oxyethylene) polymer, which preferably is a highmolecular weight poly(oxyethylene) polymer, wherein said furtherpoly(oxyethylene) polymer preferably is a low molecular weightpoly(oxyethylene) polymer having a molecular weight of 1000 or less; oneor more primary, secondary or tertiary amides; a caspase inhibitor; ananticoagulant and/or a chelating agent. Additionally or alternatively,detection of a genetic biomarker can include a collection device forcollecting a cell-containing biological sample is provided, wherein thecollection device comprises i) a poly(oxyethylene) polymer asstabilizing agent or ii) mono-ethylene glycol as stabilizing agent andone or more further additives selected from the group consisting of: oneor more primary, secondary or tertiary amides; a caspase inhibitor; ananticoagulant and/or a chelating agent. Preferably, the collectiondevice according to the fifth aspect comprises a poly(oxyethylene)polymer, which preferably is a high molecular weight poly(oxyethylene)polymer having a molecular weight of at least 1500, as stabilizing agentand furthermore comprises one or more, preferably two or more furtheradditives selected from the group consisting of at least one furtherpoly(oxyethylene) polymer having a molecular weight that is at least100, preferably at least 200, at least 300 or at least 400 below themolecular weight of the first poly(oxyethylene) polymer which preferablyis a high molecular weight poly(oxyethylene) polymer, wherein saidfurther poly(oxyethylene) polymer preferably is a low molecular weightpoly(oxyethylene) polymer having a molecular weight of 1000 or less; oneor more primary, secondary or tertiary amides; a caspase inhibitor; ananticoagulant and/or a chelating agent.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2016/0048564, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method for building a community databaseof variant observations, comprising: receiving human variant datasetsderived from samples generated by a plurality of distinct users, whereinthe users consented to share pooled variant observations with otherusers; storing the received human variant datasets in a knowledge baseof genomic information; searching the knowledge base to identify aplurality of variant observations that meet inclusion criteria for apool; adding each identified variant observation to the pool; andcalculating one or more anonymized allele statistics from the pool,wherein at least one of the receiving, storing, searching, adding, orcalculating are performed by one or more computers. Additionally oralternatively, detection of a genetic biomarker can include a method fordetermining a candidate for a clinical trial, comprising: receivingclinical trial enrollment criteria from a user including genetictargeting criteria; searching a knowledge base of patient testinformation received from a plurality of independent entities forpatients that match the clinical trial enrollment criteria; andproviding to the user search results for consented patients that matchthe clinical trial enrollment criteria; wherein at least one of thereceiving, searching, or providing are performed by one or morecomputers.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2016/0017320, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods for reducing errors in sequencingand thus improving accuracy in mutation detection and transcriptomeprofiling including methods that use semi-random barcodes to tagsequencing fragments before amplification to reduce bias and sequencingerrors. Additionally or alternatively, detection of a genetic biomarkercan include oligonucleotides that comprise semi-random barcodesequences, including sequencing adapters, reverse transcription primersand PCR primers. The term “semi-random barcode sequences,” “semi-randomsequences,” or “semi-random barcodes” refers to a population ofsemi-random nucleotide sequences each consisting of (Xmer)n, whereinXmer is 3-mer (i.e., a 3-nucleotide oligonucleotide, also referred to as“trimer”), 4-mer (i.e., a 4-nucleotide oligonucleotide, also refers toas “tetramer”)), 5-mer (i.e., a 5-nucleotide oligonucleotide, alsorefers to as “pentamer”), or 6-mer (i.e., a 6-nucleotideoligonucleotide, also refers to as “hexamer”), and n is an integer from2 to 10. Each nucleotide sequence in the population is referred to as“semi-random barcode sequence,” “semi-random barcode,” or “semi-randomsequence.” In certain embodiments, the semi-random sequence consist of(Xmer)n, wherein Xmer is 3-mer, and n is 2, 3, 4, 5, 6, 7, 8, 9, or 10,preferably 4, 5, 6, 7, 8 or 9. In certain embodiments, Xmer is 4-mer,and n is 2, 3, 4, 5, 6, 7, 8, or 9, preferably 2, 3, 4, 5, 6, or 7. Incertain embodiments, Xmer is 5-mer, and n is 2, 3, 4, 5, 6, 7, or 8,preferably 2, 3, 4, 5, or 6. In certain embodiments, Xmer is 6-mer, andn is 2, 3, 4, 5, 6, or 7 preferably 2, 3, 4, or 5. The semi-randombarcode sequences may be synthesized from a mixture of Xmers withdefined sequences. For example, in certain embodiments, the semi-randombarcodes consist of (Xmer)n, wherein Xmer is 3-mer and n is 7. In otherwords, the semi-random barcodes are a population of 21 bpoligonucleotides that consist of 7 trimers. Such semi-random barcodesmay be synthesized with 7 successive steps during each of which steps, arandom trimer from a defined trimer mixture may be incorporated. Adefined Xmer mixture for synthesizing semi-random barcodes may have 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, or 25, or at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25, different Xmers. Thenumber of different semi-random barcode sequences synthesized from adefined Xmer mixture may be at least 100, 200, 300, 400, 500, 1,000,2,000, 3,000, 4,000, 5,000, 10,000, 50,000, 100,000, 50,000, 100,000,500,000, or 1,000,000. Preferably, each Xmer (e.g., trimer, tetramer,pentamer, and hexamer) has at least 2 bases different from another Xmerin a defined Xmer mixture so that any single-base variant within eachXmer block in sequencing reads can be identified as errors, not adifferent barcode. In certain embodiments, each Xmer (e.g., tetramer,pentamer, and hexamer) has at least 3 bases different from another Xmerin a defined Xmer mixture so that any single- or 2-base variant withineach Xmer block in sequencing reads can be identified as errors, not adifferent barcode. In certain embodiments, each Xmer (e.g., pentamer andhexamer) has at least 4 bases different from another Xmer in a definedXmer mixture so that any 1-, 2-, or 3-base variant within each Xmerblock in sequencing reads can be identified as errors, not a differentbarcode. Additionally or alternatively, detection of a genetic biomarkercan include a plurality of single-stranded (ss) oligonucleotides,wherein each oligonucleotide comprises from the 5′ to 3′ direction a 1stsequence and a 2nd sequence; (a) the 1st sequence is a semi-randomsequence consisting of (Xmer)n, wherein Xmer is 3-mer, 4-mer, 5-mer, or6-mer, and n is an integer from 2 to 8, and (b) the 2nd sequence is (i)at least 10 nucleotides in length, (ii) fully or substantiallycomplementary to a target sequence, and (iii) the same among theplurality of oligonucleotides. Additionally or alternatively, detectionof a genetic biomarker can include a plurality of double-stranded (ds)sequencing adapters, wherein each sequencing adapter comprises: (a) anoligonucleotide (“1st oligonucleotide”) from above, and (b) a 2nd ssoligonucleotide that comprises from the 3′ to 5′ direction (i) asequence (“Sequence A”) that is fully complementary to the 1st sequenceof the 1st oligonucleotide of the sequencing adapter, and (ii) thetarget sequence (“Sequence B”), and wherein the 1st oligonucleotideanneals to the 2nd ss oligonucleotide. Additionally or alternatively,detection of a genetic biomarker can include a plurality of sets ofdouble-stranded (ds) sequencing adapters, wherein (A) each set comprisesa plurality of single-stranded (ss) sequencing adapters, wherein eachsequencing adapter in each set comprises: (a) an oligonucleotide (“1stoligonucleotide”) of the plurality of ss oligonucleotides from above,and (b) a 2nd ss oligonucleotide that comprises: (i) a sequence(“Sequence A”) that is fully complementary to the 1st sequence of the1st oligonucleotide of the sequence adapter, and (ii) the targetsequence (“Sequence B”) located 5′ to Sequence A, and (iii) a sequence(“Sequence C”) that is located 3′ to Sequence B and is fullycomplementary to the 3rd sequence of the 1st oligonucleotide, andwherein the 1st oligonucleotide anneals to the 2nd oligonucleotide; and(B) wherein the plurality of ds sequencing adapters in different setsare identical to each other except in the 3rd sequence of the 1stoligonucleotide and in Sequence C of the 2nd oligonucleotide.Additionally or alternatively, detection of a genetic biomarker caninclude a method for preparing a sequencing library that comprises (1)ligating the plurality of ds sequencing adapters that comprise asemi-random barcode sequence (i.e., the 1st sequence of the 1stoligonucleotide) to dsDNA molecules or fragments of a sample.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2015/0275267, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method is provided for preparing atarget RNA depleted composition from an initial RNA containingcomposition, comprising: a) contacting the initial RNA containingcomposition with one or more groups of probe molecules, wherein a groupof probe molecules has the following characteristics: i) the groupcomprises two or more different probe molecules having a length of 100nt or less; ii) the probe molecules comprised in said group arecomplementary to a target region present in a target RNA; iii) whenhybridized to said target region, the two or more different probemolecules are located adjacent to each other in the formeddouble-stranded hybrid; and generating a double-stranded hybrid betweenthe target RNA and the probe molecules; b) capturing the double-strandedhybrid by using a binding agent which binds the double-stranded hybrid,thereby forming a hybrid/binding agent complex; c) separating thehybrid/binding agent complexes from the composition, thereby providing atarget RNA depleted composition. Additionally or alternatively,detection of a genetic biomarker can include specifically designedgroups of probe molecules which hybridize to and thus mark unwantedtarget RNA, such as e.g. different rRNA species, for depletion. Eachgroup of probe molecules targets a specific region in a target RNA, alsoreferred to as target region, and comprises two or more different shortprobe molecules which hybridize to said target region. When hybridizedto their target region, the short probe molecules of one group arelocated adjacent to each other in the formed double-stranded hybrid andthus are located in close proximity. The formed double-stranded hybridspans and thus covers the target region. The formed double-strandedhybrid which comprises the short probe molecules of one group is thenbound by an anti-hybrid binding agent, whereby a hybrid/binding agentcomplex is formed. Said complexes can be easily separated from theremaining composition, thereby removing unwanted target RNA and thusproviding a target RNA depleted composition. Additionally oralternatively, detection of a genetic biomarker can include a method forsequencing RNA molecules of interest comprised in a sample, comprising:a) obtaining a RNA containing composition, preferably by isolating totalRNA from the sample; b) depleting unwanted target RNA from the RNAcontaining composition, which preferably is total RNA, using the methodaccording to the first aspect, thereby providing a target RNA depletedcomposition; c) optionally removing unbound probe molecules; d)sequencing RNA molecules comprised in the target RNA depletedcomposition.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2015/0225775, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method for performing polymerase chainreaction (PCR), comprising: a) amplifying one or more different targetnucleic acids in the presence of one or more different primer pairsspecific to the one or more different target nucleic acids in a singlereaction mixture via PCR, wherein each primer of the one or moredifferent primer pairs contains one or more cleavable bases, and whereinin substantially all of the primers of the one or more different primerpairs, each of the one or more cleavable bases is at least 4 nucleotidesaway from the 3′ terminus of the primer that comprises the one or morecleavable bases. In certain embodiments, in substantially all of theprimers of the one or more different primer pairs, each of the one ormore cleavable bases is at least 5 nucleotides away from the 3′ terminusof the primer that comprises the one or more cleavable bases. In certainother embodiments, in substantially all of the primers of the one ormore different primer pairs, each of the one or more cleavable bases isat least 6 nucleotides away from the 3′ terminus of the primer thatcomprises the one or more cleavable bases. In certain other embodiments,in substantially all of the primers of the one or more different primerpairs, each of the one or more cleavable bases is at least 7 nucleotidesaway from the 3′ terminus of the primer that comprises the one or morecleavable bases. In certain other embodiments, in substantially all ofthe primers of the one or more different primer pairs, each of the oneor more cleavable bases is at least 8 nucleotides away from the 3′terminus of the primer that comprises the one or more cleavable bases.Step a) may comprise amplifying a single target nucleic acid in thepresence of a primer pair specific to the single target nucleic acid inthe single reaction mixture. Alternatively, step a) may compriseamplifying a plurality of different primer pairs specific to theplurality of different target nucleic acids in the single reactionmixture. In certain embodiments, in all of the primers of the one ormore different primer pairs, each of one or more cleavable bases is atleast 4 nucleotides away from the 3′ terminus of the primer thatcomprises the one or more cleavable bases. In certain other embodiments,in all of the primers of the one or more different primer pairs, each ofone or more cleavable bases is at least 5 nucleotides away from the 3′terminus of the primer that comprises the one or more cleavable bases.In certain other embodiments, in all of the primers of the one or moredifferent primer pairs, each of one or more cleavable bases is at least6 nucleotides away from the 3′ terminus of the primer that comprises theone or more cleavable bases. In certain other embodiments, in all of theprimers of the one or more different primer pairs, each of one or morecleavable bases is at least 7 nucleotides away from the 3′ terminus ofthe primer that comprises the one or more cleavable bases. In certainother embodiments, in all of the primers of the one or more differentprimer pairs, each of one or more cleavable bases is at least 8nucleotides away from the 3′ terminus of the primer that comprises theone or more cleavable bases. Preferably, the cleavable base is uracil.Alternatively, the cleavable base is inosine, an oxidized pyrimidine, anoxidized purine, 5-hydroxyuracil, 5-hydroxylmethyluracil, or5-formyluracil. The one or more different primer pairs may comprise atleast 100 different primer pairs. In some embodiments, the method mayfurther comprise one or more: b) cleaving the one or more cleavablebases in the amplification product(s) of step a) to producesingle-stranded DNA overhangs in the amplification product(s), c)digesting the single stranded DNA overhangs obtained in step b) togenerate trimmed amplification product(s), d) ligating adapters to thetrimmed amplification product(s) to produce adapter-linked trimmedamplification product(s), and e) sequencing the adapter-linked trimmedamplification product(s) of step d). Additionally or alternatively,detection of a genetic biomarker can include a primer pair set,comprising: one or more of different primer pairs specific for one ormore different target nucleic acids, wherein each primer of the one ormore different primer pairs contains one or more cleavable bases, andwherein in substantially all of the primers of the one or more differentprimer pairs, each of the one or more cleavable bases is at least 4nucleotides away from the 3′ terminus of the primer that comprises theone or more cleavable bases. In certain embodiments, in substantiallyall of the primers of the one or more different primer pairs, each ofthe one or more cleavable bases is at least 5 nucleotides away from the3′ terminus of the primer that comprises the one or more cleavablebases. In certain other embodiments, in substantially all of the primersof the one or more different primer pairs, each of the one or morecleavable bases is at least 6 nucleotides away from the 3′ terminus ofthe primer that comprises the one or more cleavable bases. In certainother embodiments, in substantially all of the primers of the one ormore different primer pairs, each of the one or more cleavable bases isat least 7 nucleotides away from the 3′ terminus of the primer thatcomprises the one or more cleavable bases. In certain other embodiments,in substantially all of the primers of the one or more different primerpairs, each of the one or more cleavable bases is at least 8 nucleotidesaway from the 3′ terminus of the primer that comprises the one or morecleavable bases. Additionally or alternatively, detection of a geneticbiomarker can include a PCR reaction mixture, comprising: the primerpair set, a DNA polymerase, dNTPs, and a PCR reaction buffer.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2015/0197787, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method is provided for enriching targetsequences from a sequencing library to provide a target enrichedsequencing library, wherein the sequencing library is suitable formassive parallel sequencing and comprises a plurality of double-strandednucleic acid molecules, wherein the method comprises: a) providingnucleoprotein filaments comprising (i) a single stranded invasion probe,wherein the invasion probe has a region of substantial complementarityto one strand of a double-stranded target sequence, (ii) a recombinase;b) forming a complex between the invasion probe and a complementaryportion of the target sequence wherein complex formation is mediated bythe recombinase; c) separating the complexes from the remainingsequencing library, thereby enriching the target sequences. Additionallyor alternatively, detection of a genetic biomarker can include a methodis provided for sequencing a target region of interest, comprising: a)providing a sequencing library suitable for massive parallel sequencingand comprising a plurality of double stranded nucleic acid molecules,wherein a portion of the double stranded nucleic acid moleculescomprised in the sequencing library, the target sequences, comprise asequence which lies in the target region of interest; b) enrichingtarget sequences corresponding to the target region of interestaccording to the method above, thereby providing a target enrichedsequencing library; c) sequencing the enriched target sequences inparallel. Additionally or alternatively, detection of a geneticbiomarker can include use of the method for sequencing for exomesequencing, exon sequencing, targeted genomic resequencing, gene panelorientated targeted genomic resequencing, transcriptome sequencingand/or molecular diagnostics. Additionally or alternatively, detectionof a genetic biomarker can include a kit for performing a methodaccording to first aspect, which comprises a) adaptors for creating asequencing library suitable for massive parallel sequencing; b)optionally one or more ligation reagents for coupling the adaptors to anucleic acid fragment; c) a recombinase, preferably a RecA likerecombinase; d) a non-hydrolyzable co-factor for the recombinase,preferably adenosine 5′-(gamma-thio)triphosphate; e) a plurality ofdifferent invasion probes wherein the invasion probes differ in theirregion of complementarity to a target region of interest; f) a pluralityof different stabilization probes being at least partially complementaryto the plurality of invasion probes; and g) a solid support suitable forcapturing synaptic complexes formed between the invasion probes andtarget sequences.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2015/0093756, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods of amplifying a target nucleicacid in a helicase-dependent reaction. Additionally or alternatively,detection of a genetic biomarker can include methods of amplifying anddetecting a target nucleic acid in a helicase-dependent reaction as wellas modified detection labels to assist in the detection. In someembodiments, the method for amplifying a target nucleic acid in ahelicase-dependent reaction, the method comprises: (a) providing targetnucleic acid to be amplified; wherein the target nucleic acid is doublestranded and is denatured by heating at 65° C. for 10 minutes in thepresence of 50 mM NaOH prior to step (b); (b) adding oligonucleotideprimers for hybridizing to the target nucleic acid of step (a); (c)synthesizing an extension product of the oligonucleotide primers whichare complementary to the templates, by means of a DNA polymerase to forma duplex; (d) contacting the duplex of step (c) with a helicasepreparation for unwinding the duplex such that the helicase preparationcomprises a helicase and a single strand binding protein (SSB) unlessthe helicase preparation comprises a thermostable helicase wherein thesingle strand binding protein is optional; and (e) repeating steps (b)(d) to exponentially and selectively amplify the target nucleic acid ina helicase-dependent reaction. Additionally or alternatively, detectionof a genetic biomarker can include a method amplifying a target nucleicacid in a helicase-dependent reaction where the target nucleic acid issubjected to a “pre” step involving RNA probes and RNA-DNA hybridcapture antibodies. This method comprises: (a) providing target nucleicacid to be amplified; wherein the target nucleic acid is single strandedDNA and wherein an RNA probes that is complementary is added to thesingle stranded DNA to bind to the DNA to form a target nucleic acidRNA-DNA hybrid; and wherein a hybrid capture antibodies that recognizesRNA-DNA hybrids bound to a magnetic bead is added to the RNA-DNA hybridto be used in step (b) (b) adding oligonucleotide primers forhybridizing to the target nucleic acid RNA-DNA hybrid of step (a); (c)synthesizing an extension product of the oligonucleotide primers whichare complementary to the templates, by means of a DNA polymerase to forma duplex; (d) contacting the duplex of step (c) with a helicasepreparation for unwinding the duplex such that the helicase preparationcomprises a helicase and a single strand binding protein (SSB) unlessthe helicase preparation comprises a thermostable helicase wherein thesingle strand binding protein is optional; and (e) repeating steps(b)-(d) to exponentially and selectively amplify the target nucleic acidin a helicase-dependent reaction. Additionally or alternatively,detection of a genetic biomarker can include a modified TaqMan probe(and method using this probe). The probe has a short tail at the 3′- or5′-end complementary to the 5′- or 3′-end, and wherein the TaqMan probeis complementary to the target nucleic acid except for this short tail,and wherein the short tail sequence forms a stem loop structure.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2015/0011416, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method for determining the presence ofone or more target polynucleotides, the method comprising; performing aPCR amplification regimen comprising cycles of strand separation, primerannealing, and primer extension on a reaction mixture comprising anucleic acid sample and a set of oligonucleotide primers specific foreach target polynucleotide; wherein each set of oligonucleotide primerscomprises a first subset of at least one truncated dual domain forwardprimer and a second subset of at least one reverse primer; wherein eachtruncated dual domain primer of a set comprises a 5′ tail region thatdiffers from the 5′ tail region on other truncated dual domain primersin the set and a 3′ core region complementary to a sequence on onestrand of a double-stranded nucleic acid comprising the target; whereinfor each truncated dual domain primer, the 3′ core substantially annealsto its complementary target site sequence at a first annealingtemperature, and the sequence comprised by the 5′ tail and 3′ coreregion substantially anneals to its complement at a second annealingtemperature, the second annealing temperature being higher than thefirst annealing temperature, such that at the second annealingtemperature the 3′ core of the truncated dual domain primer cannotsubstantially anneal to a template molecule that does not also have thecomplement of the truncated dual domain primer's 5′ tail sequence;wherein for each truncated dual domain primer of a primer set: the 5′tail sequence does not have any homology to the target sequence; and the5′ tail sequence has 6 or fewer contiguous homologous bases relative toany other 5′ tail sequence of the truncated dual domain primer set;wherein the PCR amplification regimen comprises first and second phases,the first phase comprising annealing at the first annealing temperaturefor a first set of cycles, and; the second phase comprising annealing atthe second annealing temperature for a second set of cycles; anddetecting an amplified product for each target polynucleotide, whereinthe detecting indicates the presence of the target polynucleotide. Insome embodiments, a reverse primer comprises a dual domain primer. Insome embodiments, a reverse primer comprises a truncated dual domainprimer. In some embodiments, a reverse primer comprises an amplifyingprimer. In some embodiments, the set of primers comprises at least onedual domain forward primer; wherein each dual domain primer of a setcomprises a 5′ tail region that differs from the 5′ tail region of otherdual domain primers in the set, a 3′ core region complementary to asequence on one strand of a double-stranded nucleic acid comprising saidtarget, and a terminal nucleotide complementary to one of the variantnucleotides occurring at said target site; wherein for each dual domainprimer, the 3′ core substantially anneals to its complementary targetsite sequence at a first annealing temperature, and the sequencecomprised by the 5′ tail and 3′ core region substantially anneals to itscomplement at a second annealing temperature, the second annealingtemperature being higher than the first annealing temperature, such thatat said second annealing temperature said 3′ core of said dual domainprimer cannot substantially anneal to a template molecule that does notalso have the complement of the dual domain primer's 5′ tail sequence;and wherein for each truncated dual domain primer of a primer set: the5′ tail sequence does not have any homology to the target sequence; andthe 5′ tail sequence has 6 or fewer contiguous homologous bases relativeto any other 5′ tail sequence of the dual domain or truncated dualdomain primer set. Additionally or alternatively, detection of a geneticbiomarker can include a composition for determining the presence of oneor more target polynucleotides, comprising; at least one set ofoligonucleotide primers specific for each target polynucleotide; whereineach set of oligonucleotide primers comprises a first subset of at leastone truncated dual domain forward primer and a second subset of at leastone reverse primer; wherein each truncated dual domain primer of a setcomprises a 5′ tail region that differs from the 5′ tail region on othertruncated dual domain primers in the set and a 3′ core regioncomplementary to a sequence on one strand of a double-stranded nucleicacid comprising the target; wherein for each truncated dual domainprimer, the 3′ core substantially anneals to its complementary targetsite sequence at a first annealing temperature, and the sequencecomprised by the 5′ tail and 3′core region substantially anneals to itscomplement at a second annealing temperature, the second annealingtemperature being higher than the first annealing temperature, such thatat the second annealing temperature the 3′ core of the truncated dualdomain primer cannot substantially anneal to a template molecule thatdoes not also have the complement of the primer's 5′ tail sequence;wherein for each member of the truncated dual domain primer set: the 5′tail sequence does not have any homology to the target sequence; and the5′ tail sequence has 6 or fewer contiguous homologous bases relative tothe other 5′ tail sequences of the truncated dual domain primer set. Insome embodiments, the composition can further comprise a nucleic acidsample. Additionally or alternatively, detection of a genetic biomarkercan a method for determining the presence of one or more targetpolynucleotides, the method comprising; performing a PCR amplificationregimen comprising cycles of strand separation, primer annealing, andprimer extension on a reaction mixture comprising a nucleic acid sampleand a set of oligonucleotide primers specific for each targetpolynucleotide; wherein each set of oligonucleotide primers comprises afirst subset of at least one truncated dual domain forward primer and asecond subset of at least one reverse primer; wherein each truncateddual domain primer of a set comprises a 5′ tail region that differs fromthe 5′ tail region on other truncated dual domain primers in the set anda 3′ core region complementary to a sequence on one strand of adouble-stranded nucleic acid comprising the target; wherein for eachtruncated dual domain primer, the 3′ core substantially anneals to itscomplementary target site sequence at a first annealing temperature, andthe sequence comprised by the 5′ tail and 3′ core region substantiallyanneals to its complement at a second annealing temperature, the secondannealing temperature being higher than the first annealing temperature,such that at the second annealing temperature the 3′ core of thetruncated dual domain primer cannot substantially anneal to a templatemolecule that does not also have the complement of the truncated dualdomain primer's 5′ tail sequence; wherein for each truncated dual domainprimer of a primer set: the 5′ tail sequence does not have any homologyto the target sequence; and the 5′ tail sequence has 6 or fewercontiguous homologous bases relative to any other 5′ tail sequence ofthe truncated dual domain primer set; wherein the PCR amplificationregimen comprises first and second phases, the first phase comprisingannealing at the first annealing temperature for a first set of cycles,and; the second phase comprising annealing at the second annealingtemperature for a second set of cycles; and detecting an amplifiedproduct for each target polynucleotide, wherein the detecting indicatesthe presence of the target polynucleotide. Additionally oralternatively, detection of a genetic biomarker can include acomposition for determining the presence of one or more targetpolynucleotides, comprising; at least one set of oligonucleotide primersspecific for each target polynucleotide; wherein each set ofoligonucleotide primers comprises a first subset of at least onetruncated dual domain forward primer and a second subset of at least onereverse primer; wherein each truncated dual domain primer of a setcomprises a 5′ tail region that differs from the 5′ tail region on othertruncated dual domain primers in the set and a 3′ core regioncomplementary to a sequence on one strand of a double-stranded nucleicacid comprising the target; wherein for each truncated dual domainprimer, the 3′ core substantially anneals to its complementary targetsite sequence at a first annealing temperature, and the sequencecomprised by the 5′ tail and 3′ core region substantially anneals to itscomplement at a second annealing temperature, the second annealingtemperature being higher than the first annealing temperature, such thatat the second annealing temperature the 3′ core of the truncated dualdomain primer cannot substantially anneal to a template molecule thatdoes not also have the complement of the primer's 5′ tail sequence;wherein for each member of the truncated dual domain primer set: the 5′tail sequence does not have any homology to the target sequence; and the5′ tail sequence has 6 or fewer contiguous homologous bases relative tothe other 5′ tail sequences of the truncated dual domain primer set.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2010/0113758, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a process for the purification ofbiomolecules from a sample which comprises the following steps: a)arrangement of a reaction vessel with a binding matrix in a centrifuge,wherein a solution or suspension of a sample containing biomolecules isprepared in the reaction vessel or introduced into the reaction vesselbefore or after this step; and b) inclusion of at least one multi-stagecentrifugation step comprising at least a first centrifugation step at afirst acceleration value and at least a second centrifugation step at asecond acceleration value which is higher than the first accelerationvalue; wherein c) step b) can be a binding step, a washing step and/oran elution step. Preferably, the multi-stage step b) is a binding stepin which the biomolecules are bound to the binding matrix bycentrifugation. Particularly preferably, it is envisaged that thebiomolecules are substances chosen from the group containing nucleicacids, amino acids, oligopeptides, polypeptides, monosaccharides,oligosaccharides, polysaccharides, fats, fatty acids and/or lipids. Insome embodiments, the binding matrix comprises a silicate substrate, andthat furthermore the sample containing biomolecules is mixed with atleast one chaotropic salt before the centrifugation. The embodiment issuitable in particular for nucleic acids. Preferably, the followingsteps are envisaged in this embodiment: a) arrangement of a column-likereaction vessel with a binding matrix comprising a silicate substrate ina centrifuge, wherein a solution or suspension of a nucleicacid-containing sample and at least one chaotropic salt is prepared inthe reaction vessel or introduced into the reaction vessel before orafter this step; b) inclusion of a first centrifugation step at a firstacceleration value; c) inclusion of a second centrifugation step at asecond acceleration value which is higher than the first accelerationvalue; d) optionally inclusion of further centrifugation steps betweenstep c) and step d) or after step d); e) optionally inclusion of one ormore washing steps; and f) elution of the nucleic acids bound to thesilicate substrate with an elution solution. In this embodiment, themulti-step centrifugation step is a binding step in which the nucleicacids are bound to the silicate matrix.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2009/0298187, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method for determining the presence of atarget nucleic acid in a sample. The method comprises: a) contacting oneor more polynucleotide probes with the sample under a hybridizationcondition sufficient for the one or more polynucleotide probes tohybridize to the target nucleic acid in the sample to formdouble-stranded nucleic acid hybrids, wherein the one or morepolynucleotide probes does not hybridize to a variant of the targetnucleic acid; and b) detecting the double-stranded nucleic acid hybrids,wherein detecting comprises contacting the double-stranded nucleic acidhybrids with a first anti-hybrid antibody that is immunospecific todouble-stranded nucleic acid hybrids, whereby detection of thedouble-stranded nucleic acid hybrids determines the target nucleic acidin the sample. In some embodiments, the hybridization of the nucleicacids and detection of the double-stranded nucleic acid hybrids areperformed at the same time. In some embodiments, after thedouble-stranded nucleic acid hybrids are contacted with a firstanti-hybrid antibody that is immunospecific to double-stranded nucleicacid hybrids, a second anti-hybrid antibody is added to detect thedouble-stranded nucleic acid hybrids whereby detection of thedouble-stranded nucleic acid hybrids by these second anti-hybridantibodies determines the presence of target nucleic acid in the sample.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 6,686,157, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method and compositions for the sensitive detection of the amount andlocation of specific nucleic acid sequences. In some embodiments, themethod makes use of a branched oligomer, referred to as a lollipopoligomer, that has a tail portion, a right arm portion, and a left armportion. These three components are joined at a common junction making athree-tailed structure. The two arms each end with sequencescomplementary to adjacent sequences in a target sequence. This allowsthe right and left arms to be ligated together when the oligomer ishybridized to the target sequence, thus topologically linking theoligomer to the target sequence. The tail portion can then be detectedat the location of the target sequence. By using the tail of theoligomer to prime rolling circle replication of a DNA circle, a longtandem repeat DNA is associated with the target sequence. Rolling circlereplication does not disturb association of the arms and the targetsequence, thus maintaining close association of the tandem repeat DNAand the target sequence. Additionally or alternatively, detection of agenetic biomarker can include a method of amplifying nucleic acidsequences, the method comprising: (a) mixing one or more differentlollipop oligomers with one or more target samples each comprising oneor more target sequences, and incubating under conditions that promotehybridization between the oligomers and the target sequences, whereinthe lollipop oligomers each comprise a branched oligomer comprising atail portion, wherein the tail portion comprises a rolling circlereplication primer, wherein the rolling circle replication primercomprises a complementary portion that is complementary to a primercomplement portion of an amplification target circle, (b) prior to,simultaneous with, or following step (a), mixing one or moreamplification target circles with the oligomers, and incubating underconditions that promote hybridization between the amplification targetcircles and the rolling circle replication primer portions of theoligomers, and (c) mixing DNA polymerase with the oligomers andamplification target circles, and incubating under conditions thatpromote replication of the amplification target circles, whereinreplication of the amplification target circles results in the formationof tandem sequence DNA.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 6,090,935, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method for isolating nucleic acid from a sample, said methodcomprising boiling said sample, cooling the boiled sample, allowing thenucleic acid in the liquid phase of the cooled sample to directly bindto a solid support comprising magnetic particles, and separating thesolid support with the nucleic acid bound thereto from the remainder ofsaid liquid phase. Additionally or alternatively, detection of a geneticbiomarker can include a method for isolating nucleic acid from a samplethat is fixed or aged, said method comprising boiling the fixed or agedsample, cooling the boiled sample allowing the nucleic acid in thecooled sample to directly bind to a solid support having a high surfacearea comprising magnetic particles, and separating the solid supportwith the nucleic acid bound thereto from the remainder of the cooledsample.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/032808, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a method of RNA sequencing, whereby said methodcomprises: (i) providing RNA; (ii) generating (a) single-stranded firstDNA strand(s) (cDNA), which is/are complementary to the RNA, bysubjecting the RNA to reverse transcription by using a reversetranscriptase, a first set of oligonucleotide primers, and the RNA ofstep (i), and (iii) generating a second DNA strand by using a DNApolymerase, a second set of oligonucleotide primers, and thesingle-stranded cDNA of (ii), wherein a) the first set ofoligonucleotide primers comprises a covalently coupled moiety atits/their 5′ terminal nucleotide, which blocks ligation at the 5′terminus of the generated first DNA strand; or b) the second set ofoligonucleotide primers comprises a covalently coupled moiety atits/their 5′ terminal nucleotide, which blocks ligation at the 5′terminus of the generated second DNA strand. In some embodiments, themethod further comprises the subsequent steps of: (iv) optionallyend-repairing the double-stranded DNA strands using a polynucleotidekinase and an enzyme with polymerase and exonuclease activities toobtain end-repaired DNA strands; (v) optionally adding a terminaladenine to the 3′ termini of the DNA strands using a deoxynucleotidyltransferase enzyme; and (vi) ligation of adapters, which optionallycomprise terminal thymines, to the DNA strands, which optionallycomprise 3′ terminal adenines. Said methods may further comprisesequence analysis of the generated DNA. In some embodiments, said methodcomprises: (i) providing RNA; (ii) generating (a) single-stranded firstDNA strand(s) (cDNA), which is/are complementary to the RNA, bysubjecting the RNA to reverse transcription by using a reversetranscriptase, a first set of oligonucleotide primers, and the RNA ofstep (i); (iii) generating a second DNA strand by using a DNApolymerase, a second set of oligonucleotide primers, and thesingle-stranded cDNA of (ii); (iv) ligating adapters to thedouble-stranded DNA; of step (iii) and (v) sequencing the generated DNA,wherein a) the first set of oligonucleotide primers comprises acovalently coupled moiety at its/their 5′ terminal nucleotide, whichblocks ligation at the 5′ terminus of the generated first DNA strand; orb) the second set of oligonucleotide primers comprises a covalentlycoupled moiety at its/their 5′ terminal nucleotide, which blocksligation at the 5′ terminus of the generated second DNA strand. Bygenerating the second DNA strand, a double-stranded DNA is generated. Insome embodiments of the above-mentioned method, prior to step (iv), themethod comprises the step of: (iii)(a) end-repairing the double-strandedDNA strands using a polynucleotide kinase and an enzyme with polymeraseand exonuclease activities to obtain end-repaired DNA strands. In someembodiments, step (iii)(a) is followed by step (iii)(b) comprisingadding a terminal adenine to the 3′ termini of the DNA strands by usinga deoxynucleotidyl transferase enzyme, wherein the adapters comprise 3′terminal thymines, which in step (iv) ligate to the DNA strandscomprising 3′ terminal adenines. In some embodiments, theoligonucleotide primers, which are covalently coupled to a blockingmoiety and/or unmodified oligonucleotide primers, are randomoligonucleotide primers. In some embodiments, said methods comprise theinitial step of extracting and optionally enriching the RNA of interest.In some embodiments, the extracted RNA is fragmented to an average sizeof 19-510 bp. In some embodiments of the above methods, the moleculesmay be attached to a solid support for paired-end sequencing. In someembodiments a “moiety, which blocks ligation”, or a “blocking moiety”refers to a specific part of a larger molecule, which is more than oneatom, herein the part of a modified oligonucleotide, which is covalentlycoupled to the 5′ nucleotide of a modified primer oligonucleotide. Saidmoiety preferably blocks any ligation at the site, where the moiety islocated, preferably at the 5′ terminal nucleotide of the 5′ terminus ofan oligonucleotide. In some embodiments of the above methods, theoligonucleotide primer comprising a blocking moiety is characterized inthat (i) the oligonucleotide comprises at the 5′ terminal nucleotide a5′ phosphate that is not free, wherein optionally a 5′ OH group or a 5′phosphate group at the 5′ terminal nucleotide is covalently coupled tothe moiety, which blocks ligation; (ii) the base of the 5′ terminalnucleotide is not any one of thymine, adenine, cytosine, guanine anduracil; (iii) one or both 2′ hydrogen(s) of the deoxyribose of the 5′terminal nucleotide is/are replaced by another atom or a blockingmoiety; and/or (iv) the oligonucleotide comprises a 5′ terminalnucleotide having a pentose in a sterical conformation, which is not thesterical conformation of ribose or deoxyribose in RNA or DNA. In someembodiments of the above methods, the oligonucleotide primers comprisinga covalently coupled moiety, which blocks ligation, comprise a 5′ OH ora free 5′ phosphate group at the 5′ terminal nucleotide before beingcovalently coupled to a moiety, which confers the property ofligation-blocking.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2016/193490, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a poly(alkylene oxide) polymer based sizeselective DNA isolation method for isolating DNA molecules having a sizeabove a certain cut-off value from a DNA containing sample is provided,comprising (a) preparing a binding mixture comprising the DNA containingsample, at least one poly(alkylene oxide) polymer and at least onedivalent cation, wherein said binding mixture has a pH that lies in therange of 8 to 10 and binding precipitated DNA molecules to a solid phasehaving an unmodified silicon containing surface, thereby providing asolid phase having bound thereto DNA molecules having a size above thecut-off value, wherein under the used binding conditions DNA moleculeshaving a size which is less than the cut-off value substantially do notbind to the solid phase; (b) separating the bound DNA molecules from theremaining sample; optionally washing the bound DNA molecules; andoptionally eluting the bound DNA molecules from the solid phase. In someembodiments, DNA molecules having a size above the desired cut-off valueefficiently bind to the solid phase with high yield while DNA moleculeshaving a size below said cut-off value are predominantly not bound andthus are not recovered in step (a). The cut-off value can be adjusted bymodifying the concentration of the poly(alkylene oxide) polymer in thebinding mixture as it is known from the prior art and also demonstratedby the examples. The presence of the divalent cation and the alkaline pHvalue as specified ensures efficient binding of the DNA molecules havinga size above the cut-off value, even though a solid phase having anunmodified silicon containing surface is used. The method isparticularly suitable for isolating adapter ligated DNA molecules astarget DNA molecules from an adapter ligation sample and for removingadapter monomers and adapter-adapter ligation products.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,811,759, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method for creating a signature ncRNA profile for a disease state orcondition, said method comprising: a. determining a first ncRNA profilefrom a first source, said first source being characterized as being freefrom said disease state or condition; b. determining a second ncRNAprofile from a second source, said second source characterized as beingpositive for said disease state or condition, said first and secondncRNA profile being obtained according to a method for determining aprofile of a plurality of target ncRNA molecules in a RNA sample, saidmethod comprising the steps of: i. providing said RNA sample from asubject, said sample containing said plurality of target ncRNAs; ii.contacting said sample with a first oligonucleotide specific for each ofsaid target ncRNAs to be detected under conditions appropriate to faun acomplex between said first oligonucleotides and said target ncRNAs, eachof said first oligonucleotides comprising a first signal generator togenerate a first detectable signal and each of said firstoligonucleotides having a first Tm for binding each of said targetncRNAs that is substantially the same; iii. contacting said sample witha second oligonucleotide capable of binding each of said target ncRNAsto be detected under conditions appropriate to form a complex betweensaid second oligonucleotides and said target ncRNAs, said secondoligonucleotide comprising a second signal generator to generate asecond detectable signal and each of said second oligonucleotides havinga second Tm for binding each of said target ncRNAs that is substantiallythe same; iv. determining the presence of said plurality of target ncRNAin said sample by measuring the first and second detectable signals; andv. generating a profile of the sample based on the target ncRNAsdetected c. comparing said first and second ncRNA profiles andidentifying those ncRNA molecules that are altered in said second ncRNAprofile to create a signature ncRNA profile for said disease state orcondition.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2005/0244847, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods for amplifying a circular nucleicacid template, comprising contacting the template with a reactionmixture comprising a thermostable polymerase, individual nucleotides andforward and reverse primers complementary to a common region within thetemplate, wherein the common region is preferably from about 80 to 150base pairs in length. In one embodiment, the 5′ ends of the primershybridize to opposite strands of the template about 10 to 50 base pairsapart, still more preferably from zero to twenty-five base pairs apart.In some embodiments, the 5′ end of the forward primer will generally beproximal to the 5′ end of the reverse primer and distal to the 3′ end ofthe reverse primer when the primers are hybridized to the template. In aparticularly preferred embodiment, the common region is a conservedregion, e.g., an origin of replication, within an extrachromosomalnucleic acid. The reaction mixture may further include a reaction buffercomprising a weak organic base and a weak organic acid. Additionally oralternatively, detection of a genetic biomarker can include reagents forperforming in vitro amplification of extrachromosomal DNA, includingsolutions supporting amplification and subsequent ligation reactions andsolutions supporting a combined amplification and ligation reaction,i.e., which provide the appropriate environment for simultaneouspolymerase and ligase enzyme activity.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/137826, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods for enriching template nucleic acids andto methods for generating a sequencing library. In some embodiments, themethod for enriching template nucleic acids or for generating asequencing library comprises: a) hybridizing template nucleic acids tooligonucleotides bound to a solid surface and initially comprising atleast two functional sequence elements; b) extending the surface boundoligonucleotides hybridized to the template nucleic acids to form adouble strand; c) optionally modifying the double-stranded nucleic acidsgenerated in step b); d) 3′ truncation of the surface boundoligonucleotides that have not been used in the template nucleic acidhybridization of step a); and e) optionally modifying thesingle-stranded surface bound oligonucleotides generated in step d). Insome embodiments, the further comprises the additional step of: f)hybridizing further template nucleic acids to the surface boundoligonucleotides or using functional sequence elements within surfacebound oligonucleotides for a downstream application. In someembodiments, the downstream application is nucleic acid amplification onthe solid surface, de-coupling from solid surface, in-vitrotranscription by the use of an RNA polymerase promotor within theoligonucleotide, labeling of immobilized nucleic acid by the use of aprimer binding site or a molecular barcode region for identificationwithin the oligonucleotide, sequencing, or a combination thereof. Insome embodiments, at least one of the functional sequence elements is ahybridization site or preferably a sequence useful for a downstreamapplication. In another embodiment, the functional sequence elements areconsecutive or overlapping. In certain embodiments, the functionalsequence elements are separated by predefined cleavage sites orgenerated by hybridization of protecting oligonucleotides to the surfacebound oligonucleotides. In some embodiments, steps a) to c) are repeatedat least once with template nucleic acids from different samples. Inother embodiments, steps a) to e) are repeated at least once withtemplate nucleic acids from the same sample or from different samples,optionally wherein the repetition(s) is (are) performed in parallel. Insome embodiments, the density of the surface bound oligonucleotides isbetween 500-500000 oligonucleotides/μηι2, more preferably 750-200000oligonucleotides/μηι2, most preferably 1000-100000oligonucleotides/μηη2. In other embodiments, the surface boundoligonucleotides comprise 2 to 20 functional sequence elements, morepreferably 2 to 10, most preferably 2 to 5. In certain embodiments, thelength of the surface bound oligonucleotides is within the range of4-200 nt, preferably 10-200 nt, more preferably 6-180 nt, morepreferably 8-160 nt, more preferably 10-140 nt, most preferably 20-100nt. In some embodiments, the length of the surface boundoligonucleotides is 10 nt, preferably 20 nt. In some embodiments, allfunctional sequence elements of the same position within a surface boundoligonucleotide have a unique sequence or comprise 2-100000, preferably2-50000, more preferably 2-25000, more preferably 2-10000, morepreferably 2-5000, more preferably 2-2500, most preferably 2-1000different sequences. In certain embodiments, the 3′ truncation isachieved enzymatically or chemically. In some embodiments, thedouble-stranded nucleic acids bound to the surface are modified byintroducing barcode sequences, adding sequencing adaptors, adding afluorophore at the terminus, incorporation of modified bases, or othermodifications. In other embodiments, the single-strandedoligonucleotides bound to the surface are modified by adding biotin,labeling moieties, blocking moieties, or other modifications.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/013598, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods, adapters and kits for performing singleend duplex DNA sequencing. In some embodiments, the method forperforming DNA sequencing, comprises: (a) performing a ligation reactionin the presence of a plurality of double-stranded target nucleic acidsand a first set of substantially complementary double-stranded adaptersto generate ligation products, wherein each adapter of the first setcomprises a first strand and a second strand, the first strandcomprises, from 5′ to 3′, a 5′ region that is 10 or more nucleotides inlength, a molecular tag sequence, and an optional 3′ region, the secondstrand comprises, from 3′ to 5′, a 3′ region that comprises a sequencefully complementary to a 10-nucleotide or longer portion of the 5′region of the first strand, a fully complementary sequence of themolecular tag sequence of the first strand, and an optional 5′ region,at least one mismatch between the first and second strands is located inthe 3′ region of the first strand if the 3′ region is present, and/or inthe 5′ region that is 3′ to the 10-nucleotide or longer portion of the5′ region of the first strand, and different adapters of the first setcomprise different molecular tag sequences in their first strands andcorresponding fully complementary sequences of the different moleculartag sequences in their second strands, but are otherwise identical toeach other, (b) performing an amplification reaction using the ligationproducts of step (a) as templates to generate amplification products,wherein the amplification products comprise one or more locations thatdo not form complementary base pairs in the first and second strands ofthe substantially complementary double-stranded adapters, and (c)performing sequencing reactions using amplification products of step (b)or their further amplification products as templates to obtain sequencereads that comprise the one or more locations where a complementary basepair is not formed in the first and second strands of thedouble-stranded adapters. Additionally or alternatively, detection of agenetic biomarker can include a set of substantially complementarydouble-stranded adapters, comprising at least 16 different adapters,wherein each adapter of the set comprises a first strand and a secondstrand, the first strand comprises, from 5′ to 3′, a 5′ region, amolecular tag sequence, and an optional 3′ region, the second strandcomprises, from 3′ to 5′, a 3′ region that comprises a sequence fullycomplementary to a 10-nucleotide or longer portion of the 5′ region ofthe first strand, a fully complementary sequence of the molecular tagsequence of the first strand, and an optional 5′ region, at least onemismatch between the first and second strands is located in the 3′region of the first strand if the 3′ region is present, and/or in the 5′region that is 3′ to the 10-nucleotide or longer portion of the 5′region of the first strand, and different adapters comprise differentmolecular tag sequences in their first strands and corresponding fullycomplementary sequences of the different molecular tag sequences intheir second strands, but are otherwise identical to each other.Additionally or alternatively, detection of a genetic biomarker caninclude a kit, comprising: (1) a set of substantially complementarydouble-stranded adapters, and (2) a ligase.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/165289, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include primers, primer sets, kits and methods formultiple displacement amplification (MDA). In some embodiments, themethod for amplifying nucleic acids by multiple displacementamplification, comprises: performing one or more separate multipledisplacement amplification reactions, wherein each reaction is performedin the presence of: (1) a primer set, wherein each primer of the primerset comprises a self-complementary sequence at its 5′ terminus and arandom sequence or a semi-random sequence at its 3′ terminus, andwherein the self-complementary sequences in each primer set are thesame, but different from the self-complementary sequences in anotherprimer set, (2) a DNA polymerase having a strand displacement activity,and (3) target nucleic acids. In certain embodiments, theself-complementary sequences in one or more primer sets are each 6 to 20nucleotides in length. In certain embodiments, the random sequences orthe semi-random sequences in one or more primer sets are 4 to 20nucleotides in length. Preferably, the primers are resistant to 3′-5′exonuclease proofreading activity. In certain embodiments, the DNApolymerase having a strand displacement activity is Phi29 polymerase. Incertain embodiments, at least 2 separate multiple displacementamplification reactions are performed. In certain embodiments, thetarget nucleic acids used in one or more separate multiple displacementamplification reactions are genomic DNA from one or more differentsingle cells, such as human cells. In certain embodiments, the multipledisplacement amplification is performed at a temperature from about 20°C. to about 40° C., such as cycling between two temperatures within theabove-noted range or under an isothermal condition. In certainembodiments where a plurality of separate multiple displacementamplification reactions are performed, the method further comprises:pooling the nucleic acids amplified from the plurality of multipledisplacement amplification reactions together, generating a sequencinglibrary using the pooled amplified nucleic acids, and sequencing thepooled amplified nucleic acids. Additionally or alternatively, detectionof a genetic biomarker can include a primer set, wherein each primer inthe primer set comprises a self-complementary sequence at its 5′terminus and a random sequence or a semi-random sequence at its 3′terminus, and wherein the self-complementary sequences of the primersare identical to each other. Additionally or alternatively, detection ofa genetic biomarker can include a plurality of primer sets, wherein eachprimer comprises a self-complementary sequence at its 5′ terminus and arandom sequence or a semi-random sequence at its 3′ terminus, whereinthe self-complementary sequences of primers in each primer set are thesame, but different from the self-complementary sequences of primers inanother primer set. In certain embodiments, the plurality of primer setscomprises at least 3 different primer sets.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/085321, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a method for producing a sterilized compositionsuitable for stabilizing an extracellular nucleic acid population of abiological sample, the method comprising: a) providing a compositioncomprising: i. at least one caspase inhibitor, and ii. at least onecompound selected from a thioalcohol (preferably N-acetyl-cysteine orglutathione), a water-soluble vitamin, and vitamin E or a derivativethereof; and b) irradiating the composition for sterilization.Additionally or alternatively, detection of a genetic biomarker caninclude a method for stabilizing an extracellular nucleic acidpopulation comprised in a cell-containing biological sample comprising:a) obtaining i) a sterilized composition suitable for stabilizing anextracellular nucleic acid population of a biological sample, or ii) acomposition above in sterilized form; and b) contacting thecell-containing biological sample with the sterilized composition forstabilization. Additionally or alternatively, detection of a geneticbiomarker can include a method for isolating extracellular nucleic acidsfrom a stabilized cell-containing biological sample comprising: a)stabilizing the cell-containing biological sample according to themethod; and b) isolating extracellular nucleic acids. Additionally oralternatively, detection of a genetic biomarker can include a method forprocessing and/or analyzing extracellular nucleic acids comprising: a)isolating extracellular nucleic acids from a stabilized cell-containingbiological sample according to the method; and b) processing and/oranalyzing the isolated extracellular nucleic acids. Additionally oralternatively, detection of a genetic biomarker can include a method forproducing a sterilizable composition, wherein the composition insterilized form is suitable for stabilizing an extracellular nucleicacid population of a biological sample, the method comprising: a)preparing a composition comprising: i. at least one caspase inhibitor,and ii. at least one compound selected from a thioalcohol (preferablyN-acetyl-cysteine or glutathione), a water-soluble vitamin, and vitaminE or a derivative thereof, and optionally b) sterilizing thecomposition. Additionally or alternatively, detection of a geneticbiomarker can include a sterilizable composition, wherein thecomposition in sterilized form is suitable for stabilizing anextracellular nucleic acid population of a biological sample, whereinthe composition is a composition as provided in step a) of the method.It comprises i. at least one caspase inhibitor, and ii. at least onecompound selected from a thioalcohol (preferably N-acetyl-cysteine orglutathione), a water-soluble vitamin, and vitamin E or a derivativethereof. The sterilizable composition according to the sixth aspect,which can be a composition as provided in step a) of the method, inembodiments can be sterilized to provide a sterilized composition.Additionally or alternatively, detection of a genetic biomarker caninclude a sample collection device such as a container, preferably asample collection tube, comprising the sterilizable composition above.Additionally or alternatively, detection of a genetic biomarker caninclude the use of at least one compound selected from the groupconsisting of a thioalcohol (preferably N-acetyl-cysteine orglutathione), a water-soluble vitamin, and vitamin E or a derivativethereof, for protecting a composition suitable for stabilizing anextracellular nucleic acid population of a biological sample orcomponents thereof during sterilization by irradiation.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2016/170147, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods of generating dsDNA, wherein the methodcomprises ligating a first and a second dsDNA, both optionally havingone or two single-stranded end(s), in the presence of a DNA ligase andan agent, which modulates the melting temperature of dsDNA. Additionallyor alternatively, detection of a genetic biomarker can include a methodfor ligating a first and a second dsDNA, wherein both the first and thesecond dsDNAs comprise two ssDNA regions, whereby each of the ssDNAregion ends of the first dsDNA ligates with each of the complementary ssregion ends of the second dsDNA to provide ligated circular dsDNA in thepresence of a an agent, which modifies the melting temperature of dsDNA.In some embodiments, the first or the second DNA is capable ofconferring the ability to auto-replicate within competent cells.Additionally or alternatively, detection of a genetic biomarker caninclude a method for generation of a sequencing library, wherein themethod comprises the steps of: (i) providing DNA fragments; (ii)end-repairing the DNA fragments by a polynucleotide kinase enzyme and anenzyme with polymerase and exonuclease activities to obtain blunt-ended,5′ phosphorylated DNA fragments; (iii) optionally adding a terminaladenine to the end of the end-repaired DNA fragments by adeoxynucleotidyl transferase enzyme; and (iv) ligating the DNAfragments, optionally having the terminal adenine, with sequencingadapters wherein preferably the adapters have a terminal thymidine ifthe fragments have a terminal adenine. In some embodiments, said methodfurther comprises step (v), wherein the ligated fragments of step (iv)are purified and size-selected for sequencing. In some embodiments, saidmethod further comprises step (vi), wherein the adapter-ligatedfragments are amplified and the amplification product is optionallypurified prior to sequencing. In another embodiment, the fragments ofstep (v) or (vi) are subjected to sequencing. Additionally oralternatively, detection of a genetic biomarker can include a kitcomprising: (i) a DNA ligase; and (ii) an agent, which modulates themelting temperature of dsDNA. In another embodiment, the agent whichmodulates the melting temperature of dsDNA is selected from any one oftetramethylammonium chloride (TMAC), piperazinium chloride,tetramethylpiperazinium chloride, tetraethylammonium chloride (TEAC),trimethylamine N-oxide (TMANO),2-methyl-4-carboxy-5-hydroxy-3,4,5,6-tetrahydropyrimidine THP(A),2-methyl-4-carboxy-3,4,5,6-tetrahydropyrimidine THP(B), non-ionicdetergents, such as NP-40, and Triton®X-100, and mixtures thereof. In apreferred embodiment, the agent which modulates the melting temperatureof dsDNA is selected from any one of tetramethylammonium chloride(TMAC), piperazinium chloride, tetramethylpiperazinium chloride,tetraethylammonium chloride (TEAC), trimethylamine N-oxide (TMANO),2-methyl-4-carboxy-5-hydroxy-3, 4,5,6-tetrahydropyrimidine THP(A),2-methyl-4-carboxy-3,4,5,6-tetrahydropyrimidine THP(B), and mixturesthereof. In some embodiments, the agent which modulates the meltingtemperature of dsDNA is in a ligation buffer. In some embodiments, theligase and the agent which modulates the melting temperature of dsDNAare in separate containers.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2016/135300, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods and kits for specific inhibition ofenzymes involved in the DNA preparation for NGS library constructionprotocols. Additionally or alternatively, detection of a geneticbiomarker can include a method of generating a sequencing library,wherein the method comprises the steps of: (i) providing DNA fragments;(ii) end-repairing the DNA fragments by a polynucleotide kinase enzymeand an enzyme with polymerase and exonuclease activities to obtainblunt-ended, 5′ phosphorylated DNA fragments; (iii) optionally adding aterminal adenine to the end of the end-repaired DNA fragments by adeoxynucleotidyl transferase enzyme; and (iv) ligating the DNAfragments, optionally having the terminal adenine base, with sequencingadaptors by a DNA ligase; whereby after completion of step (ii) and/orthe optional step (iii) the enzyme or enzymes used in that/those step/sis/are inactivated by the addition of (a) specific inhibitor(s). None ofthe inhibitors of the above method inhibits the enzyme activity of a/thesubsequent step(s). In some embodiments, the steps of the above method,which comprise enzyme inactivation by a specific inhibitor, do notcomprise heat-inactivation of said enzyme(s). In some embodiments, thesteps of the above method, which comprise enzyme inactivation by aspecific inhibitor, do not comprise a subsequent purification step fromsaid enzyme(s). In some alternative embodiments, the steps of the abovemethod, which comprise addition of (a) specific inhibitor(s), comprisean upstream heat-inactivation of said enzyme(s), but do not comprise apurification step from said enzyme(s). In some embodiments, the abovemethod may further comprise step (v), wherein the ligated fragments ofstep (iv) are purified and size-selected for sequencing.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2001/023618, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a composition comprising an oligonucleotide and anannealing promoting compound (APC). The composition may be used, forexample, to reduce the specificity of a hybridization reaction involvingthe oligonucleotide. The oligonucleotide may optionally be immobilizedon a solid surface, where suitable solid surfaces include a nylon tip, anylon bead, and a nylon membrane. In a preferred embodiment, the APC isan aminoalcohol, where the aminoalcohol comprises at least one aminegroup and at least one hydroxyl group. The composition may furthercomprise an acid and/or a buffer, so that the aminoalcohol may bepresent in the composition, either entirely or in part, as a salt of theaminoalcohol. The composition is preferably aqueous, and has a pH ofbetween 4 and 10. Exemplary aminoalcohol APCs are 4-hydroxypiperidine,l-methyl-3-piperidinemethanol,4,4′-trimethylenebis(1-piperidineethanol), 3-piperidinemethanol,1-ethyl-4-hydroxy-piperidine, 2-piperidineethanol,3-hydroxy-1-methylpiperidine, 1-ethyl-3-hydoxy-piperidine,4-hydroxy-1-methylpiperidine, 1-methyl-2-piperidinemethanol,2-piperidinemethanol, 2,2,6,6-tetramethyl-4-piperidinol,1,4-bis(2-hydroxyethyl)piperazine and 1-(2-hydroxyethyl)piperazine.Additionally or alternatively, detection of a genetic biomarker caninclude a method of decreasing the specificity of a hybridizationreaction between two oligonucleotides. The method comprises adding anannealing promoting compound (APC) to a hybridization reaction betweentwo oligonucleotides. Additionally or alternatively, detection of agenetic biomarker can include a method of decreasing the specificity ofa hybridization reaction between two oligonucleotides. The methodcomprises mixing a first oligonucleotide, a second oligonucleotide, andan annealing promoting compound (APC) under conditions suitable for theformation of an oligonucleotide duplex. Additionally or alternatively,detection of a genetic biomarker can include a method of identifying atarget oligonucleotide. The method comprises: (a) mixing a firstoligonucleotide having a sequence complementary to the targetoligonucleotide, a second oligonucleotide having a sequencecomplementary to the complement of the target oligonucleotide, anannealing promoting compound (APC), a polymerase, a buffer compatiblewith polymerase activity, and a target oligonucleotide; (b) heating themixture of (a) to a temperature above the melting temperature of thefirst oligonucleotide and the second oligonucleotide and theirrespective complementary sequences; (c) reducing the temperature of themixture of (b) to below the melting temperature, to thereby allowhybridization between the first oligonucleotide, the secondoligonucleotide, and the target oligonucleotide; (d) raising thetemperature of the mixture of (c) to a temperature compatible withpolymerase activity; and (e) detecting a product of polymerization.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,792,403, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea system for generating a characterization model (including, e.g.,variant type and/or zygosity) for a variant (e.g., a mutation) in atissue or sample, e.g., a tumor, or tumor sample, from a subject, e.g.,a human subject, e.g., a cancer patient. The system comprises: at leastone processor operatively connected to a memory, the at least oneprocessor when executing is configured to: a) acquire: i) a sequencecoverage input (SCI), which comprises, for each of a plurality ofselected subgenomic intervals (e.g., exons) a value for sequencecoverage at the selected subgenomic intervals (including, e.g., anormalized sequence coverage value); ii) an SNP allele frequency input(SAFI), which comprises, for each of a plurality of selected germlineSNPs, a value for the allele frequency, in the tissue or sample, e.g.,tumor sample; iii) a variant allele frequency input (VAFI), whichcomprises the allele frequency for said variant, e.g., mutation, in thetissue or sample, e.g., tumor sample; b) acquire values, determined as afunction of SCI and SAFI, for: a genomic segment total copy number (C),for each of a plurality of genomic segments; a genomic segment minorallele copy number (M), for each of a plurality of genomic segments; andsample purity (p); and c) calculate one or both, of: i) a value forvariant type, e.g., mutation type, e.g., g, which is indicative of thevariant being somatic, germline, subclonal somatic, ornot-distinguishable, wherein the at least one processor when executingis configured calculate the value for variant type, e.g., mutation type,as a function of VAFI, p, C, and M; ii) an indication of the zygosity(e.g., homozygous, heterozygous, and absent) of the variant, e.g.,mutation, in the tissue or sample, e.g., tumor sample, as function of Cand M. In an embodiment, the system is configured such that the analysiscan be performed without the need for analyzing non-tumor tissue fromthe subject. In an embodiment, the system is configured to determine forat least one of the tumor sample, the selected subgenomic intervals, andthe selected germline SNPs that the variant type, e.g., mutation type,cannot be determined for analyzed values. In an embodiment, at least oneprocessor when executing acquires the SCI calculated as a function(e.g., the log of the ratio) of the number of reads for a subgenomicinterval and the number or reads for a control (e.g., a process-matchedcontrol). In an embodiment, at least one processor when executing isconfigured to calculate SCI as a function (e.g., the log of the ratio)of the number of reads for a subgenomic interval and the number or readsfor a control (e.g., a process-matched control). In an embodiment, theat least one processor when executing is configured to validate aminimum number of subgenomic intervals have been selected or analyzed.In an embodiment, at least one processor when executing is configured tovalidate a minimum number of a plurality of germline SNPs have beenselected or analyzed. In an embodiment, the minimum number of germlineSNPs comprises at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400,450, 500, 1,000, 2,000, 3,000, 4,000, 5000, 6000, 7000, 8000, 9000,10,000, or 15,000 germline SNPs. In an embodiment, the SAFI is based, atleast in part, on a minor allele frequency in the tumor sample. In anembodiment, the at least one processor when executing is configured tocalculate, or acquire, SAFI based, at least in part, on a minor allelefrequency in the tumor sample. In an embodiment, the SAFI is based, atleast in part, on an alternative allele frequency (e.g., an allelefrequency other than a standard allele in a human genome referencedatabase). In an embodiment, the at least one processor when executingis configured to calculate, or acquire, SAFI based, at least in part, onan alternative allele frequency (e.g., an allele frequency other than astandard allele in a human genome reference database). In an embodiment,the at least one processor when executing is configured to access valuesof C, M, and p calculated from fitting a genome-wide copy number modelto the SCI and the SAFI. In an embodiment, the at least one processorwhen executing is configured to calculate C, M, and p. In an embodiment,the at least one processor when executing generates a best fit betweenthe genome-wide copy number model and the SCI and the SAFI to calculateC, M, and p. In an embodiment, values of C, M, and p fit a plurality ofgenome-wide copy number model inputs of the SCI and the SAFI. In anembodiment, the at least one processor when executing is configured togenerate a user interface. In an embodiment, the user interface isconfigured to accept as input any one or more of: a sequence coverageinput (SCI), which comprises, for each of a plurality of selectedsubgenomic intervals, e.g., exons, a value for sequence coverage at theselected subgenomic intervals (including, e.g., a normalized sequencecoverage value); an SNP allele frequency input (SAFI), which comprises,for each of a plurality of selected germline SNPs, a value for theallele frequency, in the tumor sample; a variant allele frequency input(VAFI), which comprises the allele frequency for said variant, e.g.,mutation, in the tumor sample; a genomic segment total copy number (C),for each of a plurality of genomic segments; a genomic segment minorallele copy number (M), for each of a plurality of genomic segments; andsample purity (p). In an embodiment, responsive to the user interfaceinput, e.g., for one or more (e.g., 2, 3, 4, 5 or all) of SCI, SAFI,VAFI, C, M, or p, the system generates a characterization model, e.g., acharacterization model for a variant. Additionally or alternatively,detection of a genetic biomarker can include a method of characterizinga variant, e.g., a mutation, in a tissue or sample, e.g., a tumor, ortumor sample, from a subject, e.g., a human, e.g., a cancer patient,comprising: a) acquiring: i) a sequence coverage input (SCI), whichcomprises, for each of a plurality of selected subgenomic intervals,e.g., exons, a value for normalized sequence coverage at the selectedsubgenomic intervals; ii) an SNP allele frequency input (SAFI), whichcomprises, for each of a plurality of selected germline SNPs, a valuefor the allele frequency, in the tumor or sample, e.g., tumor sample;iii) a variant allele frequency input (VAFI), which comprises the allelefrequency for said variant, e.g., mutation, in the tumor or sample,e.g., tumor sample; b) acquiring values, as a function of SCI and SAFI,for: C, for each of a plurality of genomic segments, wherein C is agenomic segment total copy number; M, for each of a plurality of genomicsegments, wherein M is a genomic segment minor allele copy number; andp, wherein p is sample purity; and c) acquiring one or both of: i) avalue for variant type, e.g. mutation type, e.g., g, which is indicativeof the variant, e.g., a mutation, being somatic, a subclonal somaticvariant, germline, or not-distinguishable, and is a function of VAFI, p,C, and M; ii) an indication of the zygosity of the variant, e.g.,mutation, in the tumor or sample, e.g., tumor sample, as function of Cand M. In an embodiment, the analysis can be performed without the needfor analyzing non-tumor tissue from the subject. In an embodiment, theanalysis is performed without analyzing non-tumor tissue from thesubject, e.g., non-tumor tissue from the same subject is not sequenced.In an embodiment, the SCI comprises values that are a function, e.g.,the log of the ratio, of the number of reads for a subgenomic interval,e.g., from the sample, and the number or reads for a control, e.g., aprocess-matched control. In an embodiment, the SCI comprises values,e.g., log r values, for at least 10, 25, 50, 100, 150, 200, 250, 300,350, 400, 450, 500, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000,8,000, 9,000, or 10,000, subgenomic intervals, e.g., exons. In anembodiment, the SCI comprises values, e.g., log r values, for at least100 subgenomic intervals, e.g., exons. In an embodiment, the SCIcomprises values, e.g., log r values, for 1,000 to 10,000, 2,000 to9,000, 3,000 to 8,000, 3,000 to 7,000, 3,000 to 6,000, or 4,000 to5,000, subgenomic intervals, e.g., exons. In an embodiment, the SCIcomprises values, e.g., log r values, for subgenomic intervals, e.g.,exons, from at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450,500, 1,000, 2,000, 3,000, or 4,000, genes. In an embodiment, at leastone, a plurality, or substantially all of the values comprised in theSCI are corrected for correlation with GC content. In an embodiment, asubgenomic interval, e.g., an exon, from the sample has at least 10, 20,30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800,900, or 1,000 reads. In an embodiment, a plurality, e.g., at least 10,25, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 1,000, 2,000,3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, or 10,000, subgenomicintervals, e.g., exons, from the sample has a predetermined number ofreads. In an embodiment, the predetermined number of reads is at least10, 20, 30, 40, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 600,700, 800, 900, or 1,000. In an embodiment, the plurality of germlineSNPs comprise at least 10, 25, 50, 100, 150, 200, 250, 300, 350, 400,450, 500, 1,000, 2,000, 3,000, 4,000, 5000, 6000, 7000, 8000, 9000,10,000, or 15,000 germline SNPs. In an embodiment, the plurality ofgermline SNPs comprise at least 100 germline SNPs. In an embodiment, theplurality of germline SNPs comprises 500 to 5,000, 1,000 to 4,000, or2,000 to 3,000 germline SNPs. In an embodiment, the allele frequency isa minor allele frequency. In an embodiment, the allele frequency is analternative allele, e.g., an allele other than a standard allele in ahuman genome reference database. In an embodiment, the method comprisescharacterizing a plurality of variants, e.g., mutants, in the tumorsample. In an embodiment, the method comprises characterizing at least2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400,450, or 500 variants, e.g., mutants. In an embodiment, the methodcomprises characterizing variants, e.g., mutants, in at least 2, 3, 4,5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300, 350, 400, 450, or 500different genes. In an embodiment, the method comprises acquiring a VAFIfor at least 2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300,350, 400, 450, or 500 variants, e.g., mutants. In an embodiment, themethod comprises performing one, two or all, of steps a), b), and c) forat least 2, 3, 4, 5, 6, 7, 8 9, 10, 25, 50, 100, 150, 200, 250, 300,350, 400, 450, or 500 variants, e.g., mutants. In an embodiment, valuesof C, M, and p are, have, or can be obtained by, fitting a genome-widecopy number model to one or both of the SCI and the SAFI. In anembodiment, values of C, M, and p fit a plurality of genome-wide copynumber model inputs of the SCI and the SAFI. In an embodiment, a genomicsegment comprises a plurality of subgenomic intervals, e.g., exons,e.g., subgenomic intervals which have been assigned a SCI value. In anembodiment, a genomic segment comprises at least 10, 20, 30, 40, 50, 60,70, 80, 90, 100, 125, 150, 175, 200, 225, 250, 275, 300, 400, or 500subgenomic intervals, e.g., exons. In an embodiment, a genomic segmentcomprises 10 to 1,000, 20 to 900, 30 to 700, 40 to 600, 50 to 500, 60 to400, 70 to 300, 80 to 200, 80 to 150, or 80 to 120, 90 to 110, or about100, subgenomic intervals, e.g., exons. In an embodiment, a genomicsegment comprises between 100 and 10,000, 100 and 5,000, 100 and 4,000,100 and 3,000, 100 and 2,000, or 100 and 1,000, subgenomic intervals,e.g., exons. In an embodiment, a genomic segment comprises 10 to 1,000,20 to 900, 30 to 700, 40 to 600, 50 to 500, 60 to 400, 70 to 300, 80 to200, 80 to 150, or 80 to 120, 90 to 110, or about 100 genomic SNPs,which have been assigned a SAFI value. In an embodiment, a genomicsegment comprises between 100 and 10,000, 100 and 5,000, 100 and 4,000,100 and 3,000, 100 and 2,000, or 100 and 1,000, genomic SNPs which havebeen assigned a SAFI value. In an embodiment, each of a plurality ofgenomic segments are characterized by having one or both of: a measureof normalized sequence coverage, e.g., log r, that differ by no morethan a preselected amount, e.g., the values for log 2 r for subgenomicintervals, e.g., exons, within the boundaries of the genomic segmentdiffer by no more than a reference value, or are substantially constant;and SNP allele frequencies for germline SNPs that differ by no more thana preselected amount, e.g., the values for germline SNP allelefrequencies for subgenomic intervals, e.g., exons, within the boundariesof the genomic segment differ by no more than a reference value, or aresubstantially constant. In an embodiment, the number of subgenomicintervals, e.g., exons, that are contained in, or are combined to form,a genomic segment is at least 2, 5, 10, 15, 20, 50, or 100 times thenumber of genomic segments. In an embodiment, the number of subgenomicintervals, e.g., exons, is at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, or 15 times the number of genomic segments. In an embodiment, aboundary for a genomic segment is provided. In an embodiment, the methodcomprises assembling sequences for subgenomic intervals, e.g., exons,into genetic segments. In an embodiment, the method comprises assemblingsequences for subgenomic intervals, with a method, e.g., a methodcomprising a circular binary segmentation (CBS), an HMM based method, aWavelet based method, or a Cluster along Chromosomes method. In anembodiment, fitting the genome-wide copy number model to the SCIcomprises using the equation of:

${{\log \; {Ratio}_{i}} = {\log_{2}\frac{{pC}_{i} + {2\left( {1 - p} \right)}}{{p\; \psi} + {2\left( {1 - p} \right)}}}},$

where ψ is tumor ploidy. In an embodiment, ψ=(ΣiliCi)/Σili, let li bethe length of a genomic segment. In an embodiment, fitting thegenome-wide copy number model to the SAFI comprises using the equationof:

${{AF} = \frac{{pM} + {1\left( {1 - p} \right)}}{{pC} + {2\left( {1 - p} \right)}}},$

where AF is allele frequency. In an embodiment, the fitting comprisesusing Gibbs sampling. In an embodiment, fitting comprises using e.g.,Markov chain Monte Carlo (MCMC) algorithm, e.g., ASCAT (Allele-SpecificCopy Number Analysis of Tumors), OncoSNP, or PICNIC (Predicting IntegralCopy Numbers In Cancer). In an embodiment, fitting comprises usingMetropolis-Hastings MCMC. In an embodiment, fitting comprises using anon-Bayesian approach, e.g., a frequentist approach, e.g., using leastsquares fitting. In an embodiment, g is determined by determining thefit of values for VAFI, p, C, and M to a model for somatic/germlinestatus. In an embodiment, the method comprises acquiring an indicationof heterozygosity for said variant, e.g., mutation. In an embodiment,sample purity (p) is global purity, e.g., is the same for all genomicsegments. In an embodiment, the value of g is acquired by:

${{AF} = \frac{{pM} + {g\left( {1 - p} \right)}}{{pC} + {2\left( {1 - p} \right)}}},$

where AF is allele frequency. In an embodiment, a value of g that isclose to 0, e.g., does not differ significantly from 0, indicates thevariant is a somatic variant. In an embodiment, a value of g that is 0,or close to 0, e.g., within a predetermined distance from 0, e.g., avalue of g of less than 0.4, indicates the variant is a somatic variant.In an embodiment, a value of g that is close to 1, e.g., does not differsignificantly from 1, indicates the variant is a germline variant. In anembodiment, a value of g that is 1, or close to 1, e.g., within apredetermined distance from 1, e.g., a value of g of more than 0.6,indicates the variant is a germline variant. In an embodiment, a valueof g is less than 1 but more than 0, e.g., if it is less than 1 by apredetermined amount and more than 0 by a predetermined amount, e.g., ifg is between 0.4 and 0.6, it indicates an indistinguishable result. Inan embodiment, a value of g that is significantly less than 0, isindicative of a subclonal somatic variant. In an embodiment, the valueof g is acquired by:

${{AF} = \frac{{pM}^{\prime} + {g\left( {1 - p} \right)}}{{pC} + {2\left( {1 - p} \right)}}},$

where AF is allele frequency, and M′=C−M (e.g., when M is a non-minorallele frequency), e.g., the variant is a germline polymorphism if g=1and the variant is a somatic mutation if g=0.

In an embodiment, the somatic/germline status is determined, e.g., whenthe sample purity is below about 40%, e.g., between about 10% and 30%,e.g., between about 10% and 20%, or between about 20% and 30%. In anembodiment, when: a value of M equal to 0 not equal to C is indicativeof absence of the variant, e.g., mutation, e.g., not existent in thetumor; a non-zero value of M equal to C is indicative of homozygosity ofthe variant, e.g., mutation, e.g., with loss of heterozygosity (LOH); avalue of M equal to 0 equal to C indicates a homozygous deletion of thevariant, e.g., mutation, e.g., not existent in the tumor; and a non-zerovalue of M not equal to C is indicative of heterozygosity of thevariant, e.g., mutation. In an embodiment, the method comprisesacquiring an indication of zygosity for said variant, e.g., mutation. Inan embodiment, the mutation status is determined as homozygous (e.g.,LOH) if M=C≠0. In an embodiment, the mutation status is determined ashomozygous deletion if M=C=0. In an embodiment, the mutation status isdetermined as heterozygous is 0<M<C. In an embodiment, the mutation isabsent from the tumor if M=0 and In an embodiment, the zygosity isdetermined, e.g., when the sample purity is greater than about 80%,e.g., between about 90% and 100%, e.g., between about 90% and 95%, orbetween about 95% and 100%. In an embodiment, the control is a sample ofeuploid (e.g., diploid) tissue from a subject other than the subjectfrom which the tumor sample is from, or a sample of mixed euploid (e.g.,diploid) tissues from one or more (e.g., at least 2, 3, 4, or 5)subjects other than the subject from which the tumor sample is from. Inan embodiment, the method comprises sequencing each of the selectedsubgenomic intervals and each of the selected germline SNPs, e.g., bynext generation sequencing (NGS). In an embodiment, the sequencecoverage prior to normalization is at least about 10×, 20×, 30×, 50×,100×, 250×, 500×, 750×, or 1000× the depth of the sequencing.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/0218113, which is hereby incorporatedby reference in its entirety.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2017/0356053, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include the use of hybridization of sample nucleicacid to a bait set to evaluate a region of interest, e.g., to evaluatethe clonal profile of a region of interest, in the sample. Additionallyor alternatively, detection of a genetic biomarker can include a methodof evaluating or providing a clonal profile of a subject interval, e.g.,a subgenomic interval, or an expressed subgenomic interval (or of a cellcontaining the same), in a subject, comprising: (a) acquiring a nucleicacid library comprising a plurality of members, each member of theplurality comprising a nucleic acid from the subject, e.g., a pluralityof tumor members from a solid tumor or hematologic malignancy (orpremalignancy) sample; (b) contacting the library with a bait set toprovide a plurality of selected members, each of which comprises thesubject interval, or a portion thereof (sometimes referred to herein asa library catch); optionally, (c) amplifying each member of theplurality of selected members, e.g., to provide an amplified sequence ofthe subject interval; (d) acquiring the sequence of one or moreoccurrences of the subject interval; thereby providing or evaluating theclonal profile of a subject interval. In an embodiment, the methodcomprises evaluating the clonal profile of a subgenomic interval and ofan expressed subgenomic interval. In an embodiment, the method comprisescomparing the sequence of a first allele or signature (e.g., a first Vsegment) at the subject interval with a comparison value, e.g., apreselected value, e.g., a value that is a function of the sequence of asecond allele or signature (e.g., a second V segment). In an embodiment,the method further comprises: (e) acquiring: (i) a value for thedistribution, expression (the occurrence or level of transcribed copiesof a subgenomic signature), abundance, or identity, of a sequence,signature or allele at the subject interval, e.g., the relativeabundance, of a sequence, a signature, or an allele, or the relativeabundance of each of a plurality of sequences, signatures, or alleles,at the subject interval; or (ii) a value for variability, e.g., sequencevariability arising from a somatic hypermutation, sequence variabilityarising from a VD, DJ, or VJ junction, e.g., by the formation of anindel at the junction, or a CDR, e.g., heavy chain CDR3, sequencevariability, within a signature or subject interval, e.g., wherein avalue for variability is a function of the number of different variantspresent for the subject interval in a subject or sample. In anembodiment, the method comprises providing the clonal profile of asequence, allele or signature, e.g., a V segment, or VDJ or VJrearrangement, at a first subject interval; and i) a phenotype, e.g.,disease state, of the subject; or ii) the genotype at a second subjectinterval. In an embodiment, step (d): (i) comprises acquiring thesequence of each of a plurality of occurrences of the subject interval,e.g., acquiring the sequence of first occurrence of a subject intervalcomprising a V segment and of a second occurrence of the intervalcomprising the V segment, wherein the first and second occurrencesdiffer by the diversity at a VD, DJ, or VJ junction; or (ii) comprisesacquiring the sequence of a first subject interval and of a seconddifferent subject interval, e.g., wherein the first subject intervalcomprises a sequence from a first gene and the second subject intervalcomprises sequence from a second gene. In an embodiment, step (d)comprises acquiring the sequence of each of a plurality of occurrencesof the subject interval, e.g., a plurality of occurrences of a subjectinterval comprising a VDJ sequence, e.g., a plurality of occurrences ofa subject interval comprising a VDJ sequence comprising a specific Vsegment, a specific D segment, and a specific J segment. In anembodiment, the method comprises acquiring a value for e(i). In anembodiment, the value of (e)(i) comprises a value for the abundance of asequence, signature, or allele (e.g., a first V segment) in a subjectinterval relative to a comparison value, e.g., a preselected value,e.g., a value that is a function of the abundance of a second sequence,signature, or allele (e.g., a second V segment). In an embodiment, thevalue of (e)(i) comprises a value for the abundance of an event, e.g., asequence, allele, or signature, e.g., a mutation or rearrangement, in asubject interval, relative to a comparison value, e.g., a preselectedvalue, e.g., a value that is a function of the abundance of a sequencelacking the event, e.g., an unmutated or unrearranged sequence in thesubject interval. In an embodiment, the value of (e)(i) comprises avalue of relative abundance for each of X unique (i.e., different fromone another) sequences, signatures, or alleles, at a subject interval.Additionally or alternatively, detection of a genetic biomarker caninclude a method of evaluating a subject for the occurrence of a wholearm or large rearrangement, e.g., a rearrangement, e.g., atranslocation, duplication, insertion, or deletion, comprising, e.g., atleast 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, or 90%, or all of achromosome arm, comprising: (a) acquiring a nucleic acid librarycomprising a plurality of members, each member of the pluralitycomprising nucleic acid from the subject; (b) contacting the librarywith a bait set, e.g., under conditions of solution hybridization, toprovide a plurality of selected members, each of which comprises asubject interval, or a portion thereof (sometimes referred to herein asa library catch); (c) amplifying each member of the plurality, e.g., bya method that does not rely on a sequence specific interaction withtarget/subject nucleic acid in the member, e.g., by amplifying eachmember of the plurality with a primer that does not bind totarget/subject nucleic acid in the member; and (d) acquiring thesequence of a plurality of subject intervals, wherein said plurality ofsubject intervals is disposed on a chromosome such as to allowdetermination of a whole arm or large rearrangement. Additionally oralternatively, detection of a genetic biomarker can include a method ofevaluating a subject, comprising: (a) acquiring a nucleic acid librarycomprising a plurality of members, each member of the pluralitycomprising nucleic acid from the subject, e.g., a plurality of tumormembers from a hematological-cancer sample; (b) contacting the librarywith a bait set, e.g., under conditions of solution hybridization, toprovide a plurality of selected members, each of which comprises thesubject interval, or a portion thereof (sometimes referred to herein asa library catch); (c) amplifying each member of the plurality ofselected members, e.g., by a method that does not rely on a sequencespecific interaction with target/subject nucleic acid in the member,e.g., by amplifying each member of the plurality of selected memberswith a primer that does not bind to target/subject nucleic acid in themember; (d) acquiring the sequence of a subgenomic interval and anexpressed subgenomic interval; thereby evaluating the subject, wherein:(i) the method comprises contacting the library with a bait set thatprovides both a subgenomic interval and an expressed subgenomicinterval; (ii) the method comprises contacting the library with a firstbait set that provides a subgenomic interval and a second bait set thatprovides an expressed subgenomic interval; (iii) wherein the librarycomprises genomic DNA and is contacted with a bait set that provides asubgenomic interval and the method further comprises a second librarywhich comprises cDNA which is contacted with the bait set to provide anexpressed subgenomic interval; (iv) wherein the library comprisesgenomic DNA and is contacted with a bait set that provides a subgenomicinterval and the method further comprises a second library whichcomprises cDNA which is contacted with a second bait set to provide anexpressed subgenomic interval; or (v) the method comprises performingone of steps (a), (b) and (c) in a first reaction mix to provide a firstsubject interval, e.g., a subgenomic interval, and on a second reactionmix to provide a second subject interval, e.g., an expressed subgenomicinterval, e.g., that corresponds to the subgenomic interval.Additionally or alternatively, detection of a genetic biomarker caninclude a method of analyzing a sample, e.g., a tumor sample from ahematologic malignancy (or premalignancy), e.g., a hematologicmalignancy (or premalignancy). The method comprises: (a) acquiring oneor a plurality of libraries comprising a plurality members from asample, e.g., a plurality of tumor members from a tumor sample; (b)optionally, enriching the one or a plurality of libraries forpreselected sequences, e.g., by contacting the one or a plurality oflibraries with a bait set (or plurality of bait sets) to provideselected members (sometimes referred to herein as library catch); (c)acquiring a read for a subject interval, e.g., a subgenomic interval oran expressed subgenomic interal, from a member, e.g., a tumor memberfrom a library or library catch, e.g., by a method comprisingsequencing, e.g., with a next generation sequencing method; (d) aligningsaid read by an alignment method; and (e) assigning a nucleotide value(e.g., calling a mutation, e.g., with a Bayesian method) from said readfor the preselected nucleotide position, thereby analyzing said tumorsample, optionally wherein: a read from each of X unique subjectintervals (e.g., subgenomic intervals, expressed subgenomic intervals,or both) is aligned with a unique alignment method, wherein uniquesubject interval (e.g., subgenomic interval or expressed subgenomicinterval) means different from the other X−1 subject intervals (e.g.,subgenoimc intervals, expressed subgenomic intervals, or both), andwherein unique alignment method means different from the other X−1alignment methods, and X is at least 2. Additionally or alternatively,detection of a genetic biomarker can include a method of analyzing asample, e.g., a tumor sample from a hematologic malignancy (orpremalignancy), e.g., a hematologic malignancy (or premalignancy). Themethod comprises: (a) acquiring one or a plurality of librariescomprising a plurality members from a sample, e.g., a plurality of tumormembers from the sample, e.g., the tumor sample; (b) optionally,enriching the one or a plurality of libraries for preselected sequences,e.g., by contacting the library with a bait set (or plurality of baitsets) to provide selected members, e.g., a library catch; (c) acquiringa read for a subject interval (e.g., a subgenomic interval or anexpressed subgenomic interval) from a member, e.g., a tumor member fromsaid library or library catch, e.g., by a method comprising sequencing,e.g., with a next generation sequencing method; (d) aligning said readby an alignment method; and (e) assigning a nucleotide value (e.g.,calling a mutation, e.g., with a Bayesian method or a calling method)from said read for the preselected nucleotide position, therebyanalyzing said tumor sample. optionally wherein a nucleotide value isassigned for a nucleotide position in each of X unique subject intervals(subgenomic intervals, expressed subgenomic intervals, or both) isassigned by a unique calling method, wherein unique subject interval(e.g., subgenomic interval or expressed subgenomic interval) meansdifferent from the other X−1 subject intervals (e.g., subgenoimcintervals, expressed subgenomic intervals, or both), and wherein uniquecalling method means different from the other X−1 calling methods, and Xis at least 2. The calling methods can differ, and thereby be unique,e.g., by relying on different Bayesian prior values.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2015/0324519, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods, systems, and apparatuses formaking more accurate variant calls based on sequencing reads of asample, e.g., obtained from a targeted sequencing. For example, oncesequence reads are received and aligned to a reference sequence,sequence reads having a variant at a location can be counted. A firstvariant frequency of a particular variant measured at one location of asample can be compared to one or more second variant frequencies of theparticular variant measured at other positions and/or from othersamples. The second variant frequency can correspond to an expectedvalue for sequencing errors for a sequencing run. In some embodiments, aprobability value indicating the confidence level that a variant is atrue positive at a location can be calculated based on variant countsand total read counts at a plurality of locations in the target regionin one or more samples. The probability value can then be compared witha threshold level to determine whether the detected variant is a truepositive. In other embodiments, a difference in variant counts and totalreads counts at a same location in a test sample and a reference sample(e.g., assumed to only have sequencing errors at the location) can beused to determine whether a variant is a true positive in a test sample.According to one embodiment, a method can detect true positives for rarevariants in a target region of a test sample. For each sample, variantfrequencies for variants of a same variant class at locations where areference allele exists on a reference sequence can be calculated usingvariant counts and total read counts. A distribution of the variantfrequencies for variants of the same class can be used to determine theprobability value of a variant at a location in the test sample with adetermined variant frequency. Based on the probability value, thevariant at the location in the test sample is classified as either atrue positive (mutation) or a false positive. In other embodiments, amethod can detect true positives for rate variants in a target region ofa test sample by using a comparison with one or more reference samples.A variant count and a wild type count for a specific variant at aspecific location in the test sample can be determined from the alignedsequence reads, and compared with a variant count and a wild type countfor the specific variant at the specific location in the one or morereference samples to determine a probability value. Based on theprobability value, the specific variant at the specific location in thetest sample is classified as either a true positive or false positive.In one embodiment, a computer-implemented method of detecting lowfrequency variants in a target region in a first sample is provided. Insome embodiments, the method comprises (at a computer system) receivinga plurality of sequence reads obtained from sequencing DNA fragmentsfrom one or more samples, the one or more samples including the firstsample, wherein the sequencing includes targeting the target region inthe DNA fragments; aligning the plurality of sequence reads to thetarget region of a reference sequence; identifying a first candidatevariant having a first allele at a first location of the target regionbased on sequence reads of the first sample differing from a referenceallele at the first location of the reference sequence; determining afirst variant frequency for the first allele at the first location basedon sequence reads of the first sample that align to the first locationof the reference sequence; identifying the first candidate variant ascorresponding to a first variant class selected from a plurality ofvariant classes, each variant class of the plurality of variant classescorresponding to a different type of variant; identifying a set ofsecond locations in the target region of the reference sequence thathave the reference allele, wherein at least 50% of the other locationsin the one or more samples exhibit a false positive for the firstallele, and wherein the set of second locations includes the firstlocation; at each of the set of second locations and for each of the oneor more samples: determining a second variant frequency of the firstallele based on sequence reads of the sample that align to the secondlocation of the reference sequence, the second variant frequenciesforming a statistical distribution; comparing the first variantfrequency to a statistical value of the statistical distribution todetermine a probability value of the first variant frequency relative tothe statistical value of the statistical distribution; and comparing theprobability value to a threshold value as part of determining whetherthe first candidate variant is a true positive in the first sample forthe first allele, the threshold value differentiating between falsepositives and true positives for the first allele. In certainembodiments, the reference sequence corresponds to a consensus sequenceas determined from normal cells. In some embodiments, the one or moresamples are derived from cell-free DNA fragments. In some embodiments,the one or more samples are derived from RNA of a biological sample. Insome embodiments, the plurality of samples are sequenced in a singlesequencing run. In other embodiments, the statistical value of thestatistical distribution includes a mean value. In other embodiments,the probability value is a z-score, modified z-score, cumulativeprobability, Phred quality score, or modified Phred quality score. Inother embodiments, the statistical distribution is the statisticaldistribution of logarithmic transformations of the second variantfrequencies. In other embodiments, the threshold is determined usingsupport vector machines classifier based on training data obtained fromone or more sequencing runs. In other embodiments, the threshold is afunction of variant frequency. In another embodiment, acomputer-implemented method of detecting a variant having a first alleleat a first location in a target region in a first sample is provided. Insome embodiments, the method comprises (at a computer system): receivinga plurality of sequence reads obtained from sequencing DNA fragmentsfrom at least two samples, the at least two samples including the firstsample, wherein the sequencing includes targeting the target region inthe DNA fragments; aligning the plurality of sequence reads to thetarget region of a reference sequence; identifying whether the firstallele exists at the first location in each sample of the at least twosamples based on aligned sequence reads of each sample at the firstlocation differing from a reference allele at the first location of thereference sequence; determining a variant count of the first allele atthe first location and a wild type count of the reference allele at thefirst location for each sample of the at least two samples; selecting,from the at least two samples, at least one sample as a referencesample; comparing a first variant count of the first allele at the firstlocation and a first wild type count of the reference allele at thefirst location for the first sample to a second variant count of thefirst allele at the first location and a second wild type count of thereference allele at the first location for the reference sample todetermine a probability value of the variant having the first allele atthe first location for the first sample; and comparing the probabilityvalue to a threshold value as part of determining whether the firstallele at the first location in the first sample is a true positive forthe first allele, the threshold value differentiating between falsepositives and true positives for the first allele at the first location.In certain embodiments, the reference sample comprises two samples withlowest variant frequencies for the first allele at the first locationamong the at least two samples other than the first sample. In someembodiments, the probability value is determined using chi-squaredcumulative distribution function. In some embodiments, the probabilityvalue is determined using Pearson proportion test. In some embodiments,the probability value is one or more of z-score, modified z-score,p-value, chi-squared value, cumulative probability value, and qualityscore. In some embodiments, the quality score is determined using alook-up table. In some embodiments, the threshold is determined usingsupport vector machines classifier based on training data obtained fromone or more sequencing runs. In some embodiments, the threshold is afunction of variant frequency. In another embodiment, a computer productcomprising a non-transitory computer readable medium storing a pluralityof instructions that when executed control a computer system to detecttrue variants in a target region of a first sample is provided. In someembodiments, the instructions comprise receiving a plurality of sequencereads obtained from sequencing DNA fragments from one or more samples,the one or more samples including the first sample, wherein thesequencing includes targeting the target region in the DNA fragments;aligning the plurality of sequence reads to the target region of areference sequence; identifying a set of sequence locations in thetarget region of the reference sequence that have a reference allele ofvariants in a variant class, wherein at least 50% of the sequencelocations in the one or more samples exhibit a false positive for thevariants in the variant class in the sequence reads, and wherein the setof sequence locations includes a first location; at each location of theset of sequence locations and for each sample of the one or moresamples: determining a read count at each location for each sample;identifying candidate variants having variant alleles for the variantsin the variant class based on sequence reads of each sample differingfrom the reference allele at the same location of the referencesequence, a total number of the candidate variants at each location ineach sample being the variant count in each location for each sample;determining a variant frequency of variants in the variant class basedon the read count and the variant count, the variant frequency for eachlocation in each sample forming a statistical distribution, wherein thevariant frequency at a first location in the set of sequence locationsfor the first sample is a first variant frequency; comparing the firstvariant frequency to a value of the statistical distribution todetermine a probability value of the first variant frequency relative tothe value of the statistical distribution; and comparing the probabilityvalue to a threshold value as part of determining whether candidatevariants in the first sample are true positives, the threshold valuedifferentiating between false positives and true positives for thevariants in the variant class. In certain embodiments, the statisticaldistribution is the statistical distribution of a logarithmictransformation of the variant frequency at each location for eachsample.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,340,830, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of analyzing a tumor sample. The method comprises: (a)acquiring a library comprising a plurality of target members, e.g.,tumor members, from a sample, e.g., a tumor sample; (b) optionally,contacting the library with a bait set (or plurality of bait sets) toprovide selected members (sometimes referred to herein as “librarycatch”); (c) acquiring a read for a subgenomic interval from a tumormember from said library or library catch, e.g., by sequencing, e.g.,with a next generation sequencing method; (d) aligning said read; and(e) assigning a nucleotide value (e.g., calling a mutation, e.g., with aBayeisan method) from said read for a preselected nucleotide position,e.g., for a preselected nucleotide position in each of a plurality ofsubgenomic intervals, e.g., each of a plurality genes, thereby analyzingsaid sample, wherein: (i) each of X nucleotide positions is analyzedunder a unique set of conditions for one or a combination of steps (b),(c), (d), or (e) (wherein unique means different from the other X−1 setsof conditions and wherein X is at least 2, 5, 10, 20, 30, 40, 50, 100,200, 300 or 500). E.g., a first set of conditions, is used for a firstnucleotide position, e.g., in a first subgenomic interval or gene, and asecond set of conditions, e.g., a second set of conditions, is used fora second nucleotide position, e.g., in a second subgenomic interval orgene; (ii) for each of X nucleotide positions, responsive to acharacteristic of a preselected alteration, e.g., mutation, that canoccur at the nucleotide position, the nucleotide position is analyzedunder a unique set of conditions (wherein unique means different fromthe other X−1 sets of conditions and wherein X is at least 2, 5, 10, 20,30, 40, 50, 100, 200, 300 or 500). E.g., responsive to a characteristic,e.g., a characteristic, of a preselected alteration, e.g., mutation,that can occur at a nucleotide position in a first subgenomic interval,the nucleotide position is analyzed under a first set of conditions, andresponsive to a characteristic, e.g., a characteristic, of a preselectedalteration, e.g., mutation, that can occur at a nucleotide position in asecond subgenomic interval, the nucleotide position is analyzed undersecond set of conditions; (iii) wherein said method is performed on asample, e.g., a preserved tumor sample, under conditions that allow for95, 98, or 99% sensitivity or specificity for nucleotide positions in atleast 2, 5, 10, 20, 50 or 100 subgenomic intervals, e.g., genes; or (iv)wherein the method comprises one or more or all of: a) sequencing afirst subgenomic interval to provide for about 500×or higher sequencingdepth, e.g., to sequence a mutation present in no more than 5% of thecells from the sample; b) sequencing a second subgenomic interval toprovide for about 200× or higher, e.g., about 200×—about 500×,sequencing depth, e.g., to sequence a mutation present in no more than10% of the cells from the sample; c) sequencing a third subgenomicinterval to provide for about 10-100× sequencing depth, e.g., tosequence one or more subgenomic intervals (e.g., exons) that are chosenfrom: a) a pharmacogenomic (PGx) single nucleotide polymorphism (SNP)that may explain the ability of patient to metabolize different drugs,or b) a genomic SNPs that may be used to uniquely identify (e.g.,fingerprint) a patient; d) sequencing a fourth subgenomic interval toprovide for about 5-50× sequencing depth, e.g., to detect a structuralbreakpoint, such as a genomic translocation or an indel. For example,detection of an intronic breakpoint requires 5-50× sequence-pairspanning depth to ensure high detection reliability. Such bait sets canbe used to detect, for example, translocation/indel-prone cancer genes;or e) sequencing a fifth subgenomic interval to provide for about0.1-300× sequencing depth, e.g., to detect copy number changes. In oneembodiment, the sequencing depth ranges from about 0.1-10× sequencingdepth to detect copy number changes. In other embodiments, thesequencing depth ranges from about 100-300× to detect a genomicSNPs/loci that is used to assess copy number gains/losses of genomic DNAor loss-of-heterozygosity (LOH). Additionally or alternatively,detection of a genetic biomarker can include a method of analyzing asample, e.g., a tumor sample. The method comprises: (a) acquiring alibrary comprising a plurality members from a sample, e.g., a pluralityof tumor members from a tumor sample; (b) optionally, enriching thelibrary for preselected sequences, e.g., by contacting the library witha bait set (or plurality of bait sets) to provide selected members(sometimes referred to herein as library catch); (c) acquiring a readfor a subgenomic interval from a member, e.g., a tumor member from saidlibrary or library catch, e.g., by a method comprising sequencing, e.g.,with a next generation sequencing method; (d) aligning said read by analignment method; and (e) assigning a nucleotide value (e.g., calling amutation, e.g., with a Bayesian method) from said read for thepreselected nucleotide position, thereby analyzing said tumor sample,wherein a read from each of X unique subgenomic intervals is alignedwith a unique alignment method, wherein unique subgenomic interval meansdifferent from the other X−1 subgenoimc intervals, and wherein uniquealignment method means different from the other X−1 alignment methods,and X is at least 2. In an embodiment, step (b) is present. In anembodiment step (b) is absent. In an embodiment, X is at least 3, 4, 5,10, 15, 20, 30, 50, 100, 500, or 1,000. In an embodiment, a method(e.g., element (d) of the method recited above) comprises selecting orusing an alignment method for analyzing, e.g., aligning, a read, whereinsaid alignment method is a function of, is selected responsive to, or isoptimized for, one or more or all of: (i) tumor type, e.g., the tumortype in said sample; (ii) the gene, or type of gene, in which saidsubgenomic interval being sequenced is located, e.g., a gene or type ofgene characterized by a preselected or variant or type of variant, e.g.,a mutation, or by a mutation of a preselected frequency; (iii) the site(e.g., nucleotide position) being analyzed; (iv) the type of variant,e.g., a substitution, within the subgenomic interval being evaluated;(v) the type of sample, e.g., an FFPE sample; and (vi) sequence in ornear said subgenomic interval being evaluated, e.g., the expectedpropensity for misalignment for said subgenomic interval, e.g., thepresence of repeated sequences in or near said subgenomic interval.Additionally or alternatively, detection of a genetic biomarker caninclude a method of analyzing a sample, e.g., a tumor sample. The methodcomprises: (a) acquiring a library comprising a plurality members from asample, e.g., a plurality of tumor members from the sample, e.g., thetumor sample; (b) optionally, enriching the library for preselectedsequences, e.g., by contacting the library with a bait set (or pluralityof bait sets) to provide selected members, e.g., a library catch; (c)acquiring a read for a subgenomic interval from a member, e.g., a tumormember from said library or library catch, e.g., by a method comprisingsequencing, e.g., with a next generation sequencing method; (d) aligningsaid read by an alignment method; and (e) assigning a nucleotide value(e.g., calling a mutation, e.g., with a Bayesian method or a callingmethod) from said read for the preselected nucleotide position, therebyanalyzing said tumor sample, wherein a nucleotide value is assigned fora nucleotide position in each of X unique subgenomic intervals isassigned by a unique calling method, wherein unique subgenomic intervalmeans different from the other X−1 subgenoimc intervals, and whereinunique calling method means different from the other X−1 callingmethods, and X is at least 2. The calling methods can differ, andthereby be unique, e.g., by relying on different Bayesian prior values.In an embodiment, step (b) is present. In an embodiment, step (b) isabsent. Additionally or alternatively, detection of a genetic biomarkercan include a method of analyzing a sample, e.g., a tumor sample. Themethod comprises: (a) acquiring a library comprising a plurality ofmembers (e.g., target members) from a sample, e.g., a plurality of tumormembers from a tumor sample; (b) contacting the library with a bait setto provide selected members (e.g., a library catch); (c) acquiring aread for a subgenomic interval from a member, e.g., a tumor member fromsaid library or library catch, e.g., by a method comprising sequencing,e.g., with a next generation sequencing method; (d) aligning said readby an alignment method; and (e) assigning a nucleotide value (e.g.,calling a mutation, e.g., with a Bayesian method or a method) from saidread for the preselected nucleotide position, thereby analyzing saidtumor sample, wherein the method comprises contacting the library with aplurality, e.g., at least two, three, four, or five, of baits or baitsets, wherein each bait or bait set of said plurality has a unique (asopposed to the other bait sets in the plurality), preselected efficiencyfor selection. E.g., each unique bait or bait set provides for a uniquedepth of sequencing. In an embodiment, the efficiency of selection of afirst bait set in the plurality differs from the efficiency of a secondbait set in the plurality by at least 2 fold. In an embodiment, thefirst and second bait sets provide for a depth of sequencing thatdiffers by at least 2 fold. In an embodiment, the method comprisescontacting one, or a plurality of the following bait sets with thelibrary: a) a bait set that selects sufficient members comprising asubgenomic interval to provide for about 500× or higher sequencingdepth, e.g., to sequence a mutation present in no more than 5% of thecells from the sample; b) a bait set that selects sufficient memberscomprising a subgenomic interval to provide for about 200× or higher,e.g., about 200×—about 500×, sequencing depth, e.g., to sequence amutation present in no more than 10% of the cells from the sample; c) abait set that selects sufficient members comprising a subgenomicinterval to provide for about 10-100× sequencing depth, e.g., tosequence one or more subgenomic intervals (e.g., exons) that are chosenfrom: a) a pharmacogenomic (PGx) single nucleotide polymorphism (SNP)that may explain the ability of patient to metabolize different drugs,or b) a genomic SNPs that may be used to uniquely identify (e.g.,fingerprint) a patient; d) a bait set that selects sufficient memberscomprising a subgenomic interval to provide for about 5-50× sequencingdepth, e.g., to detect a structural breakpoint, such as a genomictranslocation or an indel. For example, detection of an intronicbreakpoint requires 5-50× sequence-pair spanning depth to ensure highdetection reliability. Such bait sets can be used to detect, forexample, translocation/indel-prone cancer genes; or e) a bait set thatselects sufficient members comprising a subgenomic interval to providefor about 0.1-300× sequencing depth, e.g., to detect copy numberchanges. In one embodiment, the sequencing depth ranges from about0.1-10× sequencing depth to detect copy number changes. In otherembodiments, the sequencing depth ranges from about 100-300× to detect agenomic SNPs/loci that is used to assess copy number gains/losses ofgenomic DNA or loss-of-heterozygosity (LOH). Such bait sets can be usedto detect, for example, amplification/deletion-prone cancer genes.Additionally or alternatively, detection of a genetic biomarker caninclude a sample, e.g., a tumor sample. The method comprises: (a)acquiring a library comprising a plurality members from a sample, e.g.,a plurality of tumor members from a tumor sample; (b) optionally,enriching the library for preselected sequences, e.g., by contacting thelibrary with a bait set (or plurality of bait sets) to provide selectedmembers (e.g., a library catch); (c) acquiring a read for a subgenomicinterval from a member, e.g., a tumor member from said library orlibrary catch, e.g., by a method comprising sequencing, e.g., with anext generation sequencing method; (d) aligning said read by analignment method; and (e) assigning a nucleotide value (e.g., calling amutation, e.g., with a Bayesian method or a method) from said read forthe preselected nucleotide position, thereby analyzing said tumorsample, wherein the method comprises sequencing, e.g., by a nextgeneration sequencing method, a subgenomic interval from at least five,six, seven, eight, nine, ten, fifteen, twenty, twenty-five, thirty ormore genes or gene products from the sample, wherein the genes or geneproducts are chosen from: ABL1, AKT1, AKT2, AKT3, ALK, APC, AR, BRAF,CCND1, CDK4, CDKN2A, CEBPA, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2,FGFR3, FLT3, HRAS, JAK2, KIT, KRAS, MAP2K1, MAP2K2, MET, MLL, MYC, NF1,NOTCH1, NPM1, NRAS, NTRK3, PDGFRA, PIK3CA, PIK3CG, PIK3R1, PTCH1, PTCH2,PTEN, RB1, RET, SMO, STK11, SUFU, or TP53. In an embodiment, step (b) ispresent. In an embodiment, step (b) is absent.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/151524, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods of evaluating the mutation load in asample, by providing a sequence of a set of subgenomic intervals fromthe sample; and determining a value for the mutational load, wherein thevalue is a function of the number of alterations in the set ofsubgenomic intervals. In certain embodiments, the set of subgenomicintervals are from a predetermined set of genes, for example, apredetermined set of genes that does not include the entire genome orexome. In certain embodiments, the set of subgenomic intervals is a setof coding subgenomic intervals. In other embodiments, the set ofsubgenomic intervals contains both a coding subgenomic interval and anon-coding subgenomic interval. In certain embodiments, the value forthe mutation load is a function of the number of an alteration (e.g., asomatic alteration) in the set of subgenomic intervals. In certainembodiments, the number of an alteration excludes a functionalalteration, a germline alteration, or both. In some embodiments, thesample is a tumor sample or a sample derived from a tumor. Additionallyor alternatively, detection of a genetic biomarker can include methodscomprising, e.g., one or more of: acquiring a library comprising aplurality of tumor members from the sample; contacting the library witha bait set to provide selected tumor members by hybridization, therebyproviding a library catch; acquiring a read for a subgenomic intervalcomprising an alteration from the tumor member from the library catch;aligning the read by an alignment method; assigning a nucleotide valuefrom the read for a preselected nucleotide position; and selecting a setof subgenomic intervals from a set of the assigned nucleotide positions,wherein the set of subgenomic intervals are from a predetermined set ofgenes. Additionally or alternatively, detection of a genetic biomarkercan include a method of evaluating the mutation load in a sample, e.g.,a tumor sample (e.g., a sample acquired from a tumor), the methodincludes: a) providing a sequence, e.g., a nucleotide sequence, of a setof subgenomic intervals (e.g., coding subgenomic intervals) from thesample, wherein the set of subgenomic intervals are from a predeterminedset of genes; and b) determining a value for the mutation load, whereinthe value is a function of the number of an alteration (e.g., one ormore alterations), e.g., a somatic alteration (e.g., one or more somaticalterations), in the set of subgenomic intervals. In certainembodiments, the number of an alteration excludes a functionalalteration in a subgenomic interval. In other embodiments, the number ofan alteration excludes a germline alteration in a subgenomic interval.In certain embodiments, the number of an alteration excludes afunctional alteration in a subgenomic interval and a germline alterationin a subgenomic interval. In certain embodiments, the set of subgenomicintervals comprises coding subgenomic intervals. In other embodiments,the set of subgenomic intervals comprises non-coding subgenomicintervals. In certain embodiments, the set of subgenomic intervalscomprises coding subgenomic intervals. In other embodiments, the set ofsubgenomic intervals comprises one or more coding subgenomic intervalsand one or more non-coding subgenomic intervals. In certain embodiments,about 5% or more, about 10% or more, about 20% or more, about 30% ormore, about 40% or more, about 50% or more, about 60% or more, about 70%or more, about 80% or more, about 90% or more, or about 95% or more, ofthe subgenomic intervals in the set of subgenomic intervals are codingsubgenomic intervals. In other embodiments, about 90% or less, about 80%or less, about 70% or less, about 60% or less, about 50% or less, about40% or less, about 30% or less, about 20% or less, about 10% or less, orabout 5% or less, of the subgenomic intervals in the set of subgenomicintervals are non-coding subgenomic intervals. In other embodiments, theset of subgenomic intervals does not comprise the entire genome or theentire exome. In other embodiments, the set of coding subgenomicintervals does not comprise the entire exome. In certain embodiments,the mutation load is expressed as a percentile, e.g., among the mutationloads in samples from a reference population. In certain embodiments,the reference population includes patients having the same type ofcancer as the subject. In other embodiments, the reference populationincludes patients who are receiving, or have received, the same type oftherapy, as the subject. Additionally or alternatively, detection of agenetic biomarker can include a method of evaluating the mutation loadin a sample, e.g., a tumor sample or a sample derived from a tumor. Themethod includes: (i) acquiring a library comprising a plurality of tumormembers from the sample; (ii) contacting the library with a bait set toprovide selected tumor members, wherein said bait set hybridizes withthe tumor member, thereby providing a library catch; (iii) acquiring aread for a subgenomic interval comprising an alteration (e.g., a somaticalteration) from a tumor member from said library catch, e.g., by anext-generation sequencing method; (iv) aligning said read by analignment method; (v) assigning a nucleotide value from said read for apreselected nucleotide position; (vi) selecting a set of subgenomicintervals (e.g., coding subgenomic intervals) from a set of the assignednucleotide positions, wherein the set of subgenomic intervals are from apredetermined set of genes; and (vii) determining a value for themutational load, wherein the value is a function of the number of analteration (e.g., one or more alterations), e.g., a somatic alteration(e.g., one or more somatic alterations), in the set of subgenomicintervals. In certain embodiments, the number of an alteration (e.g., asomatic alteration) excludes a functional alteration in a subgenomicinterval. In other embodiments, the number of an alteration excludes agermline alteration in a subgenomic interval. In certain embodiments,the number of an alteration (e.g., a somatic alteration) excludes afunctional alteration in a subgenomic interval and a germline alterationin a subgenomic interval. In certain embodiments, the predetermined setof genes comprises a plurality of genes, which in mutant form, areassociated with an effect on cell division, growth or survival, or areassociated with a cancer. In certain embodiments, the method furthercomprises acquiring a library comprising a plurality of tumor membersfrom the sample. In certain embodiments, the method further comprisescontacting a library with a bait set to provide selected tumor members,wherein said bait set hybridizes with a tumor member from the library,thereby providing a library catch. In certain embodiments, the methodfurther comprises acquiring a read for a subgenomic interval comprisingthe alteration (e.g., somatic alteration) from a tumor member from alibrary or library catch, thereby acquiring a read for the subgenomicinterval, e.g., by a next-generation sequencing method. In certainembodiments, the method further comprises aligning a read for thesubgenomic interval by an alignment method. In certain embodiments, themethod further comprises assigning a nucleotide value for a preselectednucleotide position from a read for the subgenomic interval, e.g., by amutation calling method. In certain embodiments, the method furthercomprises one, two, three, four, or all of: (a) acquiring a librarycomprising a plurality of tumor members from the sample; (b) contactingthe library with a bait set to provide selected tumor members, whereinsaid bait set hybridizes with the tumor member, thereby providing alibrary catch; (c) acquiring a read for a subgenomic interval comprisingthe alteration (e.g., somatic alteration) from a tumor member from saidlibrary catch, thereby acquiring a read for the subgenomic interval,e.g., by a next-generation sequencing method; (d) aligning said read byan alignment method; or (e) assigning a nucleotide value from said readfor a preselected nucleotide position, e.g., by a mutation callingmethod. In certain embodiments, the germline alteration is excluded by amethod or system comprising the use of an SGZ algorithm. In certainembodiments, the method further comprises characterizing a variant,e.g., an alteration, in the tumor sample by: a) acquiring: i) a sequencecoverage input (SCI), which comprises, for each of a plurality ofselected subgenomic intervals, a value for normalized sequence coverageat the selected subgenomic intervals, wherein SCI is a function of thenumber of reads for a subgenomic interval and the number of reads for aprocess-matched control; ii) an SNP allele frequency input (SAFI), whichcomprises, for each of a plurality of selected germline SNPs, a valuefor the allele frequency in the tumor sample, wherein SAFI is based, atleast in part, on a minor or alternative allele frequency in the tumorsample; and iii) a variant allele frequency input (VAFI), whichcomprises the allele frequency for said variant in the tumor sample; b)acquiring values, as a function of SCI and SAFI, for: i) a genomicsegment total copy number (C) for each of a plurality of genomicsegments; ii) a genomic segment minor allele copy number (M) for each ofa plurality of genomic segments; and iii) sample purity (p), wherein thevalues of C, M, and p are obtained by fitting a genome-wide copy numbermodel to SCI and SAFI; and c) acquiring: a value for mutation type, g,for which is indicative of the variant, being somatic, a subclonalsomatic variant, germline, or not-distinguishable, and is a function ofVAFI, p, C, and M. The SGZ algorithm is described in InternationalApplication Publication No. WO 2014/183078 and U.S. ApplicationPublication No. 2014/0336996, the contents of which are incorporated byreference in their entirety. Additionally or alternatively, detection ofa genetic biomarker can include a system for evaluating the mutationload in a sample (e.g., a tumor sample or a sample derived from atumor). The system includes at least one processor operatively connectedto a memory, the at least one processor when executing is configured to:a) acquire a sequence, e.g., a nucleotide sequence, of a set ofsubgenomic intervals (e.g., coding subgenomic intervals) from thesample, wherein the set of coding subgenomic intervals are from apredetermined set of genes; and b) determine a value for the mutationalload, wherein the value is a function of the number of an alteration(e.g., a somatic alteration) in the set of subgenomic intervals.Additionally or alternatively, detection of a genetic biomarker caninclude a method of analyzing a sample, e.g., a tumor sample from ahematologic malignancy (or premaligancy). The method comprises: (a)acquiring one or a plurality of libraries comprising a plurality membersfrom a sample, e.g., a plurality of tumor members from a tumor sample;(b) optionally, enriching the one or a plurality of libraries forpreselected sequences, e.g., by contacting the one or a plurality oflibraries with a bait set (or plurality of bait sets) to provideselected members (sometimes referred to herein as library catch); (c)acquiring a read for a subject interval, e.g., a subgenomic interval oran expressed subgenomic interval, from a member, e.g., a tumor memberfrom a library or library catch, e.g., by a method comprisingsequencing, e.g., with a next-generation sequencing method; (d) aligningsaid read by an alignment method, and (e) assigning a nucleotide value(e.g., calling a mutation, e.g., with a Bayesian method) from said readfor the preselected nucleotide position, thereby analyzing said tumorsample, optionally wherein: a read from each of X unique subjectintervals (e.g., subgenomic intervals, expressed subgenomic intervals,or both) is aligned with a unique alignment method, wherein uniquesubject interval (e.g., subgenomic interval or expressed subgenomicinterval) means different from the other X−1 subject intervals (e.g.,subgenomic intervals, expressed subgenomic intervals, or both), andwherein unique alignment method means different from the other X−1alignment methods, and X is at least 2. In an embodiment, a method(e.g., element (d) of the method recited above) comprises selecting orusing an alignment method for analyzing, e.g., aligning, a read, whereinsaid alignment method is a function of, is selected responsive to, or isoptimized for, one or more or all of: (i) tumor type, e.g., the tumortype in said sample; (ii) the gene, or type of gene, in which saidsubject interval (e.g., subgenomic interval or expressed subgenomicinterval) being sequenced is located, e.g., a gene or type of genecharacterized by a preselected or variant or type of variant, e.g., amutation, or by a mutation of a preselected frequency; (iii) the site(e.g., nucleotide position) being analyzed; (iv) the type of variant,e.g., a substitution, within the subject interval (e.g., subgenomicinterval or expressed subgenomic interval) being evaluated; (v) the typeof sample, e.g., an FFPE sample, a blood sample, or a bone marrowaspirate sample; and (vi) sequence in or near said subgenomic intervalbeing evaluated, e.g., the expected propensity for misalignment for saidsubject interval (e.g., subgenomic interval or expressed subgenomicinterval), e.g., the presence of repeated sequences in or near saidsubject interval (e.g., subgenomic interval or expressed subgenomicinterval). In some embodiments, the method can comprise using analignment method that is appropriately tuned and that includes:selecting a rearrangement reference sequence for alignment with a read,wherein said rearrangement reference sequence is preselected to alignwith a preselected rearrangement (in embodiments the reference sequenceis not identical to the genomic rearrangement); comparing, e.g.,aligning, a read with said preselected rearrangement reference sequence.In embodiments, other methods are used to align troublesome reads. Thesemethods are particularly effective when the alignment of reads for arelatively large number of diverse subgenomic intervals is optimized. Byway of example, a method of analyzing a tumor sample can comprise:performing a comparison, e.g., an alignment comparison, of a read undera first set of parameters (e.g., a first mapping algorithm or with afirst reference sequence), and determining if said read meets a firstpredetermined alignment criterion (e.g., the read can be aligned withsaid first reference sequence, e.g., with less than a preselected numberof mismatches); if said read fails to meet the first predeterminedalignment criterion, performing a second alignment comparison under asecond set of parameters, (e.g., a second mapping algorithm or with asecond reference sequence); and, optionally, determining if said readmeets said second predetermined criterion (e.g., the read can be alignedwith said second reference sequence with less than a preselected numberof mismatches), wherein said second set of parameters comprises use of aset of parameters, e.g., said second reference sequence, which, comparedwith said first set of parameters, is more likely to result in analignment with a read for a preselected variant, e.g., a rearrangement,e.g., an insertion, deletion, or translocation.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2013/0266938, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods of detecting the presence orabsence of a target nucleic acid sequence in a biological sample. Insome embodiments, the method comprises: a. contacting adetectably-labeled probe comprising an anchor nucleic acid domain and areporter nucleic acid domain with the sample; and b. detecting thepresence or absence of binding of the probe to the target nucleic acid,wherein the anchor and reporter domains are linked by a non-nucleosidelinker, and neither the anchor nor the reporter domain forms a stem loopin the absence of the target nucleic acid; and wherein (i) the probe isnot extendible by a polymerase; (ii) the linker is linked to the anchordomain within 2 nucleotides of the 3′ end of the anchor domain and thelinker is linked to the reporter domain within 2 nucleotides of the 5′end of the reporter domain, wherein the anchor domain is not linked to adetectable label; and/or (iii) the anchor domain and the reporter domaineach comprise a contiguous sequence of at least 6 nucleotidescomplementary to the same strand of the target nucleic acid. In someembodiments, the probe is not extendible by a polymerase In someembodiments, the linker is linked to the anchor domain within 2nucleotides of the 3′ end of the anchor domain and the linker is linkedto the reporter domain within 2 nucleotides of the 5′ end of thereporter domain, wherein the anchor domain is not linked to a detectablelabel. In some embodiments, the anchor domain and the reporter domaineach comprise a contiguous sequence of at least 10 nucleotidescomplementary to one strand of the target nucleic acid. In someembodiments, the detecting step comprises measuring the meltingtemperature of a complex formed between the reporter domain and thetarget nucleic acid. In some embodiments, the length of the reporterdomain is between 4 to 20 nucleotides. In some embodiments, the lengthof the reporter domain is between 6 to 12 nucleotides. Additionally oralternatively, detection of a genetic biomarker can include reactionmixtures for detecting the presence or absence of a target sequence. Insome embodiments, the reaction mixture comprises: a. a target nucleicacid comprising an anchor binding region and a reporter binding region,and b. a detectably-labeled probe comprising an anchor nucleic aciddomain and a reporter nucleic acid domain, wherein the anchor andreporter domains are linked by a non-nucleoside linker, and neither theanchor nor the reporter domain forms a stem loop in the absence of thetarget nucleic acid, and wherein (i) the probe is not extendible by apolymerase; (ii) the linker is linked to the anchor domain within 2nucleotides of the 3′ end of the anchor domain and the linker is linkedto the reporter domain within 2 nucleotides of the 5′ end of thereporter domain, wherein the anchor domain is not linked to a detectablelabel; and/or (iii) the anchor domain and the reporter domain eachcomprise a contiguous sequence of at least 6 nucleotides complementaryto the same strand of the target nucleic acid. In some embodiments, thereaction mixture further comprises nucleoside triphosphates, a DNApolymerase, and/or an oligonucleotide primer. In some embodiments, theprobe is not extendible by a polymerase. Additionally or alternatively,detection of a genetic biomarker can include a detectably-labeled probecomprising an anchor nucleic acid domain and a reporter nucleic aciddomain, wherein: the anchor and reporter domains are linked by anon-nucleoside linker; neither the anchor nor the reporter domain formsa stem loop in the absence of the target nucleic acid; and wherein: (i)the probe is not extendible by a polymerase; and/or (ii) the linker islinked to the anchor domain within 2 nucleotides of the 3′ end of theanchor domain and the linker is linked to the reporter domain within 2nucleotides of the 5′ end of the reporter domain, wherein the anchordomain is not linked to a detectable label. In some embodiments, theprobe is not extendible by a polymerase. In some embodiments, the linkeris linked to the anchor domain within 2 nucleotides of the 3′ end of theanchor domain and the linker is linked to the reporter domain within 2nucleotides of the 5′ end of the reporter domain, wherein the anchordomain is not linked to a detectable label. In some embodiments, thelength of the reporter domain is between 4 to 20 nucleotides. In someembodiments, the length of the reporter domain is between 6 to 12nucleotides. In some embodiments, the anchor domain is between 6-40nucleotides. In some embodiments, the label is a fluorescent label. Insome embodiments, the probe comprises at least one non-naturalnucleotide, wherein the non-natural nucleotide increases the meltingtemperature of the reporter domain compared to a corresponding naturalnucleotide in the place of the non-natural nucleotide. In someembodiments, the linker is polyethylene glycol. In some embodiments, thelinker is hexa-ethylene glycol.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2012/0225428, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a set of probes comprising DNA and LNAnucleotides. In some embodiments, at the 5′ end of the probes, thenucleobases are determined, whereas at the 3′ end, there are one or more(for example, two or three) random nucleotides (also referred to as“wobble” positions). An embodiment consists of a short nucleic acidstrand that can be used universally for the detection of various targetsequences. The short nucleic acid sequence is also allele specific andenables the detection of a specific mutation, such as a singlenucleotide polymorphism (SNP). Some embodiments include a compositioncomprising a first probe and a second probe. According to suchembodiments, the first probe has a 5′ end opposite a 3′ end and at leasteight nucleotides, the at least eight nucleotides comprising at leastone DNA nucleotide and at least five locked nucleic acid nucleotides anda first discriminating position; and a second probe having a 5′ endopposite a 3′ end and a same number of nucleotides as the first probe.The nucleotides of the second probe comprise a same number of DNAnucleotides and locked nucleic acid nucleotides as the first probe and asecond discriminating position located at a position corresponding tothe first discriminating position in the first probe. Also, according tosuch embodiments, the nucleotides of the first and second probescomprise one of an adenine nucleobase, a cytosine nucleobase, a guaninenucleobase, a thymine nucleobase, a uracil nucleobase, and a methylcytosine nucleobase, and the first and second probes comprise differingnucleobases at the first and second discriminating positions. However,according to such embodiments, the first and second probes comprise thesame nucleobases at all other nucleotide positions of the probes. Otherembodiments comprise a composition including a first and second set ofprobes. Each probe of the first and second sets have a 5′ end opposite a3′ end and eight nucleotides. The nucleotides of each probe of the firstset have at least one DNA nucleotide, at least five locked nucleic acidnucleotides, and a first discriminatory position, at least one lockednucleic acid nucleotide being a random locked nucleic acid nucleotide,whereas each probe of the second set of probes have a correspondingnumber of DNA nucleotides, locked nucleic acid nucleotides, and randomlocked nucleic acid nucleotides as a probe in the first set, and eachprobe of the second set has a second discriminating position located ata same nucleotide location as a first discriminating position of a probein the first set. Also, according to some such embodiments, all probesof the first and second sets have a same nucleobase sequences with theexception of (i) the nucleobase at the random locked nucleic acidnucleotides; and (ii) the nucleobase at the first and seconddiscriminating positions. Also, the nucleobase of the seconddiscriminating position differs from the nucleobase of the firstdiscriminating position at the same nucleotide location, and the atleast one random locked nucleic acid nucleotide of each probe of thesecond set comprises a same nucleobase located at a same nucleotidelocation of the at least one random locked nucleic acid nucleotide of aprobe of the first set. According to such embodiments, the nucleobase ofthe random locked nucleic acid nucleotide is selected from one ofadenine, cytosine, guanine, and thymine, and any possible nucleobasesequence resulting from nucleobase variations at the one or more randomlocked nucleic acid nucleobase position(s) is represented by at leastone probe in both the first and second set of probes. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetermining a genotype at a locus of interest in a sample comprisinggenetic material is provided. The method includes the steps ofcontacting the genetic material with a first probe and a second probeand detecting the binding of the first or second probe to the geneticmaterial, thereby determining the genotype at the locus. According tosuch embodiments, the first and second probes each have a 5′ endopposite a 3′ end and eight nucleotides comprising at least one DNAnucleotide and at least five locked nucleic acid nucleotides. Thenucleotides of the first probe comprise a first discriminating positionand the nucleotides of the second probe comprise a second discriminatingposition at a same nucleotide location in the second probe as the firstdiscriminating position in the first probe. Also, the firstdiscriminating position comprises a different nucleobase than the seconddiscriminating position, wherein the nucleobases at the othernucleotides of the first and second probes are the same. Additionally oralternatively, detection of a genetic biomarker can include acomposition including a first set of probes and a second set of probes,each of the probes having eight nucleotides being composed of one tothree DNA nucleotides and five to seven LNA (locked nucleic acid)nucleotides. According to such embodiments, all probes of the first andthe second set of probes have identical nucleotide sequences with theexception of (i) the base(s) at one, two or three LNA randomposition(s); and (ii) the base at a discriminating position, wherein theone, two or three LNA random position(s) and the discriminating positionare located at identical positions in all probes of the first and thesecond set. Further, according to such embodiments at each LNA randomposition the base is independently selected from adenine, cytosine,guanine and thymine and any possible sequence resulting from the basevariation(s) at the one, two or three LNA random position(s) isrepresented by at least one probe in each set of probes. Additionally,according to such embodiments, the base at the discriminating positionis identical within each set of probes, but differs between the firstand the second set of probes. Additionally or alternatively, detectionof a genetic biomarker can include a library of at least two sets ofprobes. According to such embodiments the library comprises a pluralityof sets of probes each of the probes having eight nucleotides with thegeneral structure 5′-D-L-L-L-L-L-X-X-3′ or 5′-D-L-L-L-L-X-X-X-3′ (whereD is a DNA nucleotide; each L is a LNA nucleotide; and each X is a LNArandom nucleotide). Also, within one set of probes, all probes haveidentical nucleotide sequences with the exception of the two and/orthree LNA random nucleotides (with each position of a LNA randomnucleotide base being independently selected from adenine, cytosine,guanine and thymine). Also, according to such embodiments, any possiblesequence resulting from the base variation(s) at the two positions isrepresented by a probe in each set of probes and one set of probesdiffering from the other set of probes in the sequence of at least theDNA nucleotide D or an LNA nucleotide L. Additionally or alternatively,detection of a genetic biomarker can include a method of determining thegenotype at a locus of interest in a sample obtained from a subject isprovided. The method includes the steps of contacting the samplecomprising the genetic material with any of the compositions of thecomposition embodiments above, and detecting the binding of a probe ofthe first or the second set of probes to the genetic material, therebydetermining the genotype at the locus.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2010/0248991, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a solid support comprising at least twosequence specific amplification primers wherein at least one primer isbound to said support with an inducible cleavable linker. Preferably,said cleavable linker is a photo-cleavable linker. In a first majorembodiment, said solid support is a bead. Such a bead is composed of amaterial selected from the group consisting of silicon,titanium-dioxide, aluminum oxide, lanthanide oxide, glass, silicates,polystyrene, cellulose, sepharose and polyamide. A bead is either of onepure material or composed of two or more materials, whereas the two ormore materials are mixed or assembled in a ordered manner like in coreshell particles. The surface of a bead is functionalized in such amanner that oligonucleotides can be attached. Additionally oralternatively, detection of a genetic biomarker can include a library ofbeads as disclosed above. Preferably, each member of the plurality ofprimers which are bound to the bead via a cleavable linker carries adifferent detectable label or a unique mixture of multiple labels. Insome embodiments, the solid support is a microtiter or picotiter (PTP)plate comprising a plurality of wells, characterized in that a pluralityof said wells comprises a surface with at least two sequence specificamplification primers wherein at least one primer is bound to saidsupport with a cleavable linker. Additionally or alternatively,detection of a genetic biomarker can include a method for preparing asolid support and preferably a bead comprising at least two sequencespecific primers, further characterized in that at least one of saidprimers is cleavable, said method comprising the steps of providing asolid support carrying at least one or more functional groups, andreacting said one or more functional groups with the reactive group orgroups of two sequence specific primers, wherein a cleavable reactivemoiety is present either within one of the spacers connecting said solidsupport with its functional group or one of its functional groups orsaid cleavable moiety is present within one of the spacers connectingone of said sequence specific primers with its reactive group.Additionally or alternatively, detection of a genetic biomarker caninclude a method comprising the steps of providing a solid supportcomprising two functional groups each carrying a different protectinggroup, deprotecting a first functional group and reacting said groupwith the reactive group of a first primer, and deprotecting the secondfunctional group and reacting said group of said bead with the reactivegroup of a second primer. Said two functional groups are connected tothe bead via two separate linkers, but in a particular embodiment, saidtwo functional groups are connected to the bead via a two arm linker. Insome embodiments, the method comprises the steps of providing a solidsupport carrying exactly one functional group, deprotecting saidfunctional group, and reacting said group with a mixture of a first anda second sequence specific primer, said first and second primerscomprising identical reactive groups, characterized in that at least oneof said primers is connected to its reactive group via a cleavablemoiety. In some embodiments, the method comprises the steps of providinga bead carrying exactly one functional group, and deprotecting saidfunctional group and reacting said group with an oligonucleotiderepresenting a first and a second amplification primer which areconnected by a cleavable moiety. In some embodiments, the methodcomprises the steps of providing a bead carrying protected OH groups,protected with two different orthogonal protecting groups, cleaving offone of said orthogonal protecting groups and synthesizing the firstprimer on the bead, and cleaving off the second of said orthogonalprotecting group and synthesizing the second primer on the bead.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2009/0105081, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods and systems for the capture andenrichment of target nucleic acids and analysis of the enriched targetnucleic acids. Additionally or alternatively, detection of a geneticbiomarker can include the enrichment of targeted sequences in a solutionbased format. In some embodiments, solution based capture methodscomprise probe derived amplicons wherein said probes for amplificationare affixed to a solid support. The solid support comprisessupport-immobilized nucleic acid probes to capture specific nucleic acidsequences (e.g., target nucleic acids) from, for example, a genomicsample. Probe amplification provides probe amplicons in solution whichare hybridized to target sequences. Following hybridization of probeamplicons to target sequences, target nucleic acid sequences present inthe sample are enriched by capturing (e.g., via linker chemistry such asbiotin, digoxigenin, etc.) and washing the probes and eluting thehybridized target nucleic acids from the captured probes (FIG. 1). Thetarget nucleic acid sequence(s), may be further amplified using, forexample, non-specific ligation-mediated PCR (LM-PCR), resulting in anamplified pool of PCR products of reduced complexity compared to theoriginal target sample. In some embodiments, hybridization between theprobes and target nucleic acids is performed under preferably stringentconditions sufficient to support hybridization between the solutionbased probe amplicons, wherein said probes comprise linker chemistry andcomplementary regions of the target nucleic acid sample to provideprobe/target hybridization complexes. The complexes are subsequentlycaptured via the linker chemistry and washed under conditions sufficientto remove non-specifically bound nucleic acids and the hybridized targetnucleic acid sequences are eluted from the captured probe/targetcomplexes. Additionally or alternatively, detection of a geneticbiomarker can include methods of isolating and reducing the geneticcomplexity of a plurality of nucleic acid molecules, the methodcomprising the steps of exposing fragmented, denatured nucleic acidmolecules of said population to multiple, different oligonucleotideprobes that are bound on a solid support under hybridizing conditions tocapture nucleic acid molecules that specifically hybridize to saidprobes, or exposing fragmented, denatured nucleic acid molecules of saidpopulation to multiple, different oligonucleotide probes underhybridizing conditions followed by binding the complexes of hybridizedmolecules to a solid support to capture nucleic acid molecules thatspecifically hybridize to said probes, wherein in both cases saidfragmented, denatured nucleic acid molecules have an average size ofabout 100 to about 1000 nucleotide residues, preferably about 250 toabout 800 nucleotide residues and most preferably about 400 to about 600nucleotide residues, separating unbound and non-specifically hybridizednucleic acids from the captured molecules, eluting the capturedmolecules, and optionally repeating the aforementioned processes for atleast one further cycle with the eluted captured molecules. Additionallyor alternatively, detection of a genetic biomarker can include anenrichment method for target nucleic acid sequences in a genomic sample,such as exons or variants, preferably SNP sites. This can beaccomplished by synthesizing genomic probes specific for a region of thegenome to capture complementary target nucleic acid sequences containedin a complex genomic sample. In some embodiments, the method furthercomprises determining the nucleic acid sequence of the captured andeluted target molecules, in particular by means of performing sequencingby synthesis reactions. Additionally or alternatively, detection of agenetic biomarker can include a method for detecting coding regionvariation relative to a reference genome, in particular relative to areference genome that comprises fragmented, denatured genomic nucleicacid molecules, the method as previously described further comprisingdetermining the nucleic acid sequence of the captured and eluted targetmolecules, in particular by means of performing sequencing by synthesisreactions and comparing the determined sequence to a sequence in adatabase, in particular to a sequence in a database of polymorphisms inthe reference genome to identify variants from the reference genome. Inembodiments, nucleic acid (pre-selected) capture probes are immobilizedonto a solid support (e.g., slide, chip, bead, etc.) using any number ofrecognized methods (e.g., spotting, photolithography, in situ synthesis,etc.). In preferred embodiments, the probes are synthesized in situ bymaskless array synthesis on a substrate and subsequently amplified by,for example, PCR resulting in probe derived amplicons in solution. Insome embodiments, the probe sequences as synthesized comprise primerbinding sites for amplification at one or both the 3′ and 5′ termini(e.g., at or near the ends) of the probes. In some embodiments, thesequence of the primer binding sites on the probes are the same at boththe 3′ and 5′ prime ends or the probes, whereas in other embodiments thesequence of the primer binding sites is different at the 3′ prime endthen the sequence at the 5′ prime end. In some embodiments,amplification primers for probe amplification further comprise arestriction endonuclease site, for example an MlyI site for easy removalof primer sequences from the final captured target, wherein one of theprimers (e.g., forward or reverse primer) further comprises linkerchemistry such as a binding moiety or sequence (e.g., biotin,digoxigenin, HIS tag, etc.) and are deposited onto the support with theimmobilized probes along with reagents necessary for exponential PCRamplification (e.g., PCR procedures for exponential amplification oftargets as known to a skilled artisan). PCR is performed therebycreating amplicons of probe capture sequences such that one of thestrands comprises linker chemistry, such as a binding moiety orsequence. The amplicon containing solution is transferred to a vessel(e.g., tube, well of a 96 well plate, etc.) and, in some embodiments,purified from reaction components. An additional round of amplificationis preferentially performed on the probe derived amplicons usingasymmetric PCR, wherein the linker chemistry labeled primer is inabundance compared to the non-labeled primer to preferentiallysynthesize single stranded binding moiety/sequence labeled amplicons.The amplicons are purified away from reaction components and transferredto a vessel, denatured nucleic acid sample is added, and hybridizationis allowed to occur. Following hybridization, labeled amplicon/targetnucleic acid complexes are captured. For example, when biotin is thebinding moiety a streptavidin (SA) coated substrate such as SA coatedbeads (e.g., paramagnetic beads/particles) are used to capture thebiotin labeled amplicon/target complex. The SA bound complex is washedand the hybridized target nucleic acids are eluted from the complex andutilized in downstream applications, such as sequencing applications.Additionally or alternatively, detection of a genetic biomarker caninclude methods for isolating and reducing the complexity of a pluralityof nucleic acid sequences comprising providing a solid support whereinsaid solid support comprises hybridization probes hybridizable to targetnucleic acid sequences and providing a fragmented nucleic acid samplecomprising target nucleic acid sequences, amplifying the hybridizationprobes wherein the amplification products comprise a binding moiety andwherein the amplification products are in solution, hybridizing thenucleic acid sample to the amplification products in solution underconditions such that hybridization between the amplification productsand target nucleic acid sequences is allowed to occur, separating thehybridized target nucleic acid sequences/amplification product complexesfrom non-specifically hybridized nucleic acids by said binding moiety,and eluting the hybridized target nucleic acid sequences from thecomplex thereby isolation and reducing the complexity of a plurality ofnucleic acid sequences. In some embodiments, the solid support is amicroarray slide. In some embodiments, the target nucleic acid sample isfragmented genomic DNA with or without adaptor molecules at one or bothends of the fragments. In some embodiments, the hybridization probescomprise a restriction endonuclease site, for example a MlyI site. Insome embodiments, probe amplification comprises exponential polymerasechain reaction, and may further comprise asymmetric non-exponentialamplification. In some embodiments, the binding moiety is biotin and thecapture substrate, such as a bead for example a paramagnetic particle,is coated with streptavidin for separation of the target nucleicacid/amplification product complex from non-specifically hybridizedtarget nucleic acids. In some embodiments, the captured target nucleicacid/amplification product complexes are washed prior to elution of thebound target nucleic acids. In some embodiments, the eluted targetnucleic acids are sequenced. Additionally or alternatively, detection ofa genetic biomarker can include methods for isolating and reducing thecomplexity of a plurality of nucleic acid sequences comprising providinga solid support wherein said solid support comprises hybridizationprobes hybridizable to target nucleic acid sequences and providing afragmented nucleic acid sample comprising target nucleic acid sequences,amplifying the hybridization probes wherein the amplification productscomprise a binding moiety and wherein the amplification products are insolution, hybridizing the nucleic acid sample to the amplificationproducts in solution under conditions such that hybridization betweenthe amplification products and target nucleic acid sequences is allowedto occur, separating the hybridized target nucleic acidsequences/amplification product complexes from non-specificallyhybridized nucleic acids by said binding moiety, eluting the hybridizedtarget nucleic acid sequences from the complex thereby isolation andreducing the complexity of a plurality of nucleic acid sequences, andsequencing the eluted target nucleic acid sequences. In someembodiments, the solid support is a microarray slide. In someembodiments, the target nucleic acid sample is fragmented genomic DNAwith or without adaptor molecules at one or both ends of the fragments.In some embodiments, the hybridization probes comprise a restrictionendonuclease site, for example a MlyI site. In some embodiments, probeamplification comprises exponential polymerase chain reaction, and mayfurther comprise asymmetric non-exponential amplification. In someembodiments, the binding moiety is biotin and the capture substrate,such as a bead for example a paramagnetic particle, is coated withstreptavidin for separation of the target nucleic acid/amplificationproduct complex from non-specifically hybridized target nucleic acids.In some embodiments, the captured target nucleic acid/amplificationproduct complexes are washed prior to elution of the bound targetnucleic acids.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2018/077847, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a method of making a library of target nucleicacid molecules from a sample comprising a plurality of target molecules,the method comprising for substantially each target molecule: ligating asingle adaptor to a target molecule forming a circular molecule, whereinthe adaptor comprises two barcodes, two primer binding sites situatedbetween the two barcodes, wherein the primers annealing to the bindingsites are facing away from each other, and at least one modifiednucleotide effecting a strand synthesis termination by a nucleic acidpolymerase situated between the two primer binding sites; annealing aforward primer complementary to the adaptor to one strand of the targetmolecule; extending the forward primer up to the modified nucleotide,thereby producing a first strand; annealing a reverse primercomplementary to the adaptor to the first strand; extending the firstprimer, thereby producing the second strand and a double-strandedmolecule comprising the first strand sand the second strand wherein thetwo barcodes are flanking the target sequence. In some embodiments, atleast one of the forward and the reverse primer comprises a 5′-fiapsequence not complementary to the adaptor and comprising an additionalprimer binding site. Then the method further comprises a step ofannealing an additional primer to the sequence complementary to the flapsequence in the forward primer and extending the additional primerthereby producing a double-stranded molecule comprising two additionalprimer sites and the two barcodes flanking the target sequence. In someembodiments, the target molecule and the adaptor are single-stranded. Inother embodiments, the target molecule and the adaptor aredouble-stranded and the circular molecule is at least partiallydenatured primer to annealing of the primer. In some embodiments thebarcode is a nucleotide sequence 4-20 bases long. The modifiednucleotide effecting a strand synthesis termination by a nucleic acidpolymerase may be selected from abasic nucleotides, nucleotides withprotein side groups, synthetic nucleotide AraC (cytarabine) ordeoxyuracil, isoguanine, 5-methylisocytosine, ethylene glycol spacers,nucleotides with bulky analogues such as fiuorophores, or unnatural basepair (UBP) “d5SICS-dNaM” nucleic acid analogues. The ligation may beselected from overhang ligation, T-A ligation, blunt-end ligation andtopoisomerase catalyzed ligation. In some embodiments, the adaptor has aphotocleavable linker on one end. In these embodiments, the linker isligated on one end and exposed to UV light to enable ligation on theother end. In some embodiments, the additional primers are sequencingprimers. Additionally or alternatively, detection of a genetic biomarkercan include a library of target nucleic acid molecules wherein eachmolecule is a circular molecule comprising a target sequence and anadaptor linking the ends of the target sequence, the adaptor comprising:two barcodes; two primer binding sites situated between the twobarcodes, wherein the primers annealing to the binding sites are facingaway from each other; at least one modified nucleotide effecting astrand synthesis termination by a nucleic acid polymerase situatedbetween the two primer binding sites. In some embodiments, the barcodeis a nucleotide sequence 4-20 bases long. The modified nucleotideeffecting a strand synthesis termination by a nucleic acid polymerasemay be selected from abasic nucleotides, nucleotides with protein sidegroups, synthetic nucleotide AraC (cytarabine) or deoxyuracil,isoguanine, 5-methylisocytosine, ethylene glycol spacers, nucleotideswith bulky analogues such as fiuorophores, or unnatural base pair (UBP)“d5SICS-dNaM” nucleic acid analogues. Additionally or alternatively,detection of a genetic biomarker can include a method of sequencingtarget nucleic acids in a sample comprising a plurality of targetmolecules, the method comprising: creating a library of target nucleicacid molecules from the sample by ligating a single double-strandedadaptor to substantially each double-stranded target molecule forming adouble stranded circular molecule, wherein the adaptor comprises twobarcodes, two primer binding sites situated between the two barcodes,wherein the primers annealing to the binding sites are facing away fromeach other, and at least one modified nucleotide effecting a strandsynthesis termination by a nucleic acid polymerase situated between thetwo primer binding sites; denaturing at least a portion of thedouble-stranded circular target molecule; annealing a forward primercomplementary to the adaptor to one strand of the target molecule;extending the forward primer up to the modified nucleotide, therebyproducing a first strand; annealing a reverse primer complementary tothe adaptor to the first strand; extending the first primer, therebyproducing the second strand and a double-stranded molecule comprisingthe first strand and the second strand wherein the two barcodes areflanking the target sequence; amplifying the double stranded molecule;and sequencing the amplified products of the double-stranded molecule.In some embodiments, at least one of the forward and the reverse primercomprises a 5′-flap sequence not complementary to the adaptor andcomprising an additional primer binding site. In some embodiments, themethod further comprises after extending the first primer, annealing anadditional primer to the sequence complementary to the flap sequence inthe forward primer and extending the additional primer thereby producinga double-stranded molecule comprising two additional primer sites andthe two barcodes flanking the target sequence. In some embodiments,amplifying or sequencing may be performed with the additional primers.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/123316, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a targeted sequencing workflow where an inputsample comprising a sufficient quantity of genomic material is providedsuch that minimal or no amplification processes are required prior tosequencing. In some embodiments, the input sample is derived from anintact tumor or from lymph nodes. In some embodiments, the input sampleis obtained through homogenization of an intact tumor sample (whole orpartial) and/or one or more lymph nodes obtained from a patient ormammalian subject. In some embodiments, the input sample is derived froma sufficient quantity of blood, including whole blood or any fractionthereof. In some embodiments, the input sample is derived from canceroustissue. In some embodiments, the input sample is derived fromprecancerous tissue. In some embodiments, the targeted sequencingworkflow comprises one or more amplification steps (e.g. a pre-captureamplification step, an amplification step post-capture) prior tosequencing, where each amplification step prior to sequencing comprisesfrom 0 to 3 amplification cycles, and wherein an aggregate ofamplification cycles prior to sequencing does not exceed 4. In otherembodiments, the targeted sequencing workflow comprises one or moreamplification steps (e.g. a pre-capture amplification step, anamplification step post-capture) prior to sequencing, where eachamplification step prior to sequencing comprises from 0 to 2amplification cycles, and wherein an aggregate of amplification cyclesprior to sequencing does not exceed 3. In yet other embodiments, thetargeted sequencing workflow comprises one amplification step prior tosequencing (e.g. either a pre-capture amplification step or anamplification step post-capture), where the single amplification stepprior to sequencing comprises from 0 to 3 amplification cycles. Infurther embodiments, the targeted sequencing workflow comprises oneamplification step prior to sequencing, where the single amplificationstep prior to sequencing comprises from 1 to 3 cycles. In yet furtherembodiments, the targeted sequencing workflow comprises oneamplification step prior to sequencing, where the single amplificationstep prior to sequencing comprises 1 cycle. In even further embodiments,the targeted sequencing workflow comprises one amplification step priorto sequencing, where the single amplification step prior to sequencingcomprises 2 cycles. In some embodiments, either or both of thepre-capture amplification step or the amplification step post-capturebut prior to sequencing utilizes LM-PCR. Additionally or alternatively,detection of a genetic biomarker can include a method of sequencinggenomic material within a sample comprising: homogenizing a tumor sampleand/or lymph node sample to provide a homogenized sample; isolating atleast 0.5 micrograms of genomic material from the homogenized sample;preparing the at least 0.5 micrograms of isolated genomic material forsequencing; and sequencing the prepared genomic material. In someembodiments, the method does not comprise any amplification steps priorto sequencing. In some embodiments, the method comprises at least onepre-capture or post-capture amplification step, wherein an aggregatenumber of amplification cycles conducted during the at least onepre-capture or post-capture amplification step is at most 4 cycles. Insome embodiments, the aggregate number of amplification cycles is 3. Insome embodiments, the aggregate number of amplification cycles is 2. Insome embodiments, the preparing of the at least 0.5 micrograms ofisolated genomic material for sequencing comprises hybridizing the atleast 0.5 micrograms of isolated genomic to capture probes and capturingthe hybridized genomic material. In some embodiments, an amount ofcaptured genomic material ranges from about 90 ng to about 900 ng. Insome embodiments, 1 or 2 amplification cycles are performed on thecaptured genomic material. In some embodiments, the homogenized samplecomprises a representative sampling of cells. In some embodiments, atleast 1 microgram of genomic material is isolated from the homogenizedsamples. In some embodiments, at least 5 micrograms of genomic materialis isolated from the homogenized samples. In some embodiments, at least10 micrograms of genomic material is isolated from the homogenizedsamples. Additionally or alternatively, detection of a genetic biomarkercan include a method of sequencing DNA within a sample comprisingisolating at least 0.5 micrograms of DNA from a blood sample; preparingthe at least 0.5 micrograms of isolated DNA for sequencing, andsequencing the prepared DNA. In some embodiments, the method comprises 0amplification steps prior to sequencing. In some embodiments, thepreparing of the at least 0.5 micrograms of isolated DNA for sequencingcomprises hybridizing the at least 0.5 micrograms of isolated genomic tocapture probes and capturing the hybridized genomic material. In someembodiments, an amount of captured genomic material ranges from about 90ng to about 900 ng. In some embodiments, 1 or 2 amplification cycles areperformed on the captured genomic material. In some embodiments, atleast 1 microgram of DNA is isolated from the blood sample. Additionallyor alternatively, detection of a genetic biomarker can include a methodof targeted representational sequencing comprising: (i) homogenizing atleast a portion of a tumor, one or more whole or partial lymph nodes, orany combination thereof to provide a homogenized sample; (ii) extractinggenomic material from the homogenized sample; (iii) capturing theextracted genomic material onto beads; and (iv) sequencing the capturedgenomic material; wherein the targeted representational sequencingcomprises performing at most 4 amplification cycles prior to sequencingof the captured genomic material. In some embodiments, the at most 3amplification cycles may be conducted prior to capture of the extractedgenomic material or after capture of the extracted genomic material, orany combination thereof. In some embodiments, no pre-captureamplification cycles are conducted. In some embodiments, an amount ofcaptured genomic material ranges from about 90 ng to about 900 ng. Insome embodiments, from 1 to 3 amplification cycles are performedfollowing capture of the extracted genomic material, but prior tosequencing. In some embodiments, at least 0.5 micrograms of genomicmaterial is extracted from the homogenized sample. In some embodiments,at least 100 times more genomic material is derived from the homogenizedsample as compared with an amount of input material used in a sequencingmethod requiring more than 4 amplification cycles. Additionally oralternatively, detection of a genetic biomarker can include a method ofsequencing DNA within a sample comprising: providing at least 0.5micrograms of input genomic material, the at least 0.5 micrograms ofgenomic material derived from a tumor sample, a lymph node sample, or ablood sample, isolating DNA from the input genomic sample, preparing theisolated DNA for sequencing, and sequencing the prepared DNA, whereinthe method does not comprise any amplification steps. In someembodiments, the at least 0.5 micrograms of input genomic material isderived from multiple histological and/or biopsy specimens. In someembodiments, the at least 0.5 micrograms of input genomic material isderived from a homogenized tumor sample. In some embodiments, the atleast 0.5 micrograms of input genomic material is derived from ahomogenized lymph node sample. In some embodiments, the at least 0.5micrograms of input genomic material is a representative sampling of thetumor sample, lymph node sample, or blood sample from which it isderived. In some embodiments, the sequencing is performed using anext-generation sequencing method. In some embodiments, sequencing isperformed using a synthesis sequencing methodology. Additionally oralternatively, detection of a genetic biomarker can include a method ofreducing PCR-introduced mutations during sequencing comprising isolatingDNA from a sample comprising a sufficient amount of genomic material;preparing the isolated DNA for sequencing; and sequencing the preparedDNA, wherein the method comprises at most 3 amplification cycles priorto sequencing. In some embodiments, the method comprises 1 or 2amplification cycles prior to sequencing. In some embodiments,sufficient amount of input genomic material is an amount such that nopre-capture amplification cycles are utilized. In some embodiments, thesample is derived from a patient suspected of having cancer. In someembodiments, the sample is derived from a patient diagnosed with cancer.In some embodiments, the sample is derived from a patient at risk ofdeveloping cancer. In some embodiments, the sample is derived fromhealthy tissue samples. In some embodiments, 0.5 micrograms of DNA isisolated from the sample. In some embodiments, at least 1 microgram ofgenomic material is isolated from the sample. In some embodiments, atleast 5 micrograms of genomic material is isolated from the sample. Insome embodiments, at least 10 micrograms of genomic material is isolatedfrom the sample. Additionally or alternatively, detection of a geneticbiomarker can include a sequencing method where PCR-introduced mutationsare reduced, the sequencing method comprising capturing at least 0.05micrograms of genomic material, and performing between 0 and 2amplification cycles prior to sequencing. In some embodiments, 0amplification cycles are conducted. In other embodiments, 1amplification cycle is conducted. In yet other embodiments, 2amplification cycles are conducted. Additionally or alternatively,detection of a genetic biomarker can include a sequence capture methodwhere PCR-introduced biases in the proportional representation of genomecontent are reduced, the sequencing method comprising providing an inputsample comprising at least 0.5 micrograms of genomic material, and wherethe sequence capture method comprises performing between 0 and 2amplification cycles prior to sequencing. In some embodiments, 0amplification cycles are conducted. In other embodiments, 1amplification cycle is conducted. In yet other embodiments, 2amplification cycles are conducted. In some embodiments, the inputsample comprises at least 1 microgram of genomic material. In someembodiments, the input sample comprises at least 5 micrograms of genomicmaterial. In some embodiments, the input sample comprises at least 10micrograms of genomic material. Additionally or alternatively, detectionof a genetic biomarker can include a sequence capture method wherePCR-introduced mutations are eliminated, the sequence capture methodcomprising preparing an input sample comprising at least 0.5 microgramsof genomic material. In some embodiments, the input sample comprises atleast 1 microgram of genomic material. In some embodiments, the inputsample comprises at least 5 micrograms of genomic material. In someembodiments, the input sample comprises at least 10 micrograms ofgenomic material. In another aspect is a sequence capture method where astep of removing PCR-duplicate reads prior to sequencing is eliminated,the sequence capture method comprising providing an input samplecomprising at least 0.5 micrograms of genomic material. In someembodiments, the input sample comprises at least 1 microgram of genomicmaterial. In some embodiments, the input sample comprises at least 5micrograms of genomic material. In some embodiments, the input samplecomprises at least 10 micrograms of genomic material. Additionally oralternatively, detection of a genetic biomarker can include a sequencingmethod where PCR-introduced mutations are virtually eliminated, thesequencing method comprising capturing at least 0.05 micrograms ofgenomic material. In some embodiments, about 0.05 micrograms of genomicmaterial are provided after capture of the genomic material. In someembodiments, 1 or 2 post-capture amplification cycles are performedprior to sequencing.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/132276, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include methods in which regions of a tumor samplepredicted not to respond to a first therapeutic agent are excised fromthe sample with an automated dissection tool, mutations correlated withpredictive biomarkers are detected in the excised region using NGS, andadditional samples of the tumor are stained for one or more predictivebiomarker(s) identified by NGS. Additionally or alternatively, detectionof a genetic biomarker can include a method comprising: obtaining afirst sample of a tumor, wherein the first sample is histochemicallystained for a first predictive biomarker for a first therapeutic agent;excising one or more region(s) from the first sample with an automateddissection tool, wherein the excised region has a staining pattern forthe first predictive biomarker indicating that the region is unlikely torespond to the first therapeutic agent; detecting with a next generationsequencer one or more one or more mutations predictive of a response toone or more additional therapeutic agents in a nucleic acid samplederived from the excised region(s) of the first sample; staining one ormore additional samples of the tumor for one or more additionalpredictive biomarker(s) correlating to the one or more mutationsidentified in the samples, the one or more additional predictivebiomarkers being predictive of a response to one or more of theadditional therapeutic agent(s). Additionally or alternatively,detection of a genetic biomarker can include a system comprising: (a) anucleic acid sample derived from one or more regions excised from afirst sample of a tumor, wherein the first sample of the tumor isstained for a first predictive biomarker, and further wherein the one ormore regions excised from the section have a staining pattern of thefirst predictive biomarker indicating that at least a portion of thetumor is unlikely to respond to a first therapeutic agent; (b) a nextgeneration sequencer adapted to identify the presence or absence ofmutations correlating to one or more additional predictive biomarkers;(c) a laboratory information system (LIS) comprising a database, thedatabase containing: (c1) mutation analysis of a nucleic acid sample bynext generation sequencing, wherein the mutation analysis indicates atleast the presence or absence of mutations in the nucleic acid sample,the mutations correlating to one or more additional predictivebiomarker(s) for one or more additional therapeutic agent(s); and (c2)instructions for directing an automated slide stainer to stain a secondsample of the tumor with the one or more additional predictivebiomarkers identified by the mutation analysis.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 10,023,917, which is hereby incorporated by referencein its entirety. For example, detection of a genetic biomarker caninclude High Resolution Melting (HRM) assays as a prescreeningdiagnostic method to diagnose mutations in the hot spot regions of themost common genes (KRAS, BRAF, PIK3CA, AKT1) concerning the RAS/RAF/MAPKand PI3K/PTEN/AKT pathway. Additionally or alternatively, detection of agenetic biomarker can include pairs of amplification primers, which areuseful for HRM analysis of genes which are important for predictingresponsiveness to cancer therapeutic agents. Additionally oralternatively, detection of a genetic biomarker can include thefollowing pairs of amplification primers for amplification and analysisof KRAS, exons 2 and 3, BRAF, exon 15, PIK3CA, exons 7, 9 and 20, andAKT1, exon 2. Additionally or alternatively, detection of a geneticbiomarker can include a composition or reaction mixture comprising atleast one pair of amplification primers as disclosed above. Thecomposition may be used for PCR amplification of nucleic acids during anucleic acid amplification reaction, and also PCR amplification andmonitoring in real time. According to some embodiments, a mixturecomprises at least: a pair of amplification primers as disclosed above;a thermostable DNA Polymerase; a mix of deoxynucleoside triphosphateswhich is usually dA, dG, dC and dT, or dA, dG, dC and dU; and a buffer.In some further embodiments, when suitable for amplification anddetection in real time, of one or more specific nucleic acid targetsequence(s) such a composition additionally comprises a nucleic aciddetecting entity such as a fluorescent hybridization probe, or afluorescent, double stranded DNA binding dye. In some embodiments, sucha DNA double stranded Dye is a dye which can be used to perform HRMcurve analysis. In some embodiments, the pair of amplification primersis designed to amplify a specific sequence of interest according tostandard methods known in the art of molecular biology. In someembodiments, when brought into contact with a sample that shall beanalyzed, such a PCR reaction mixture additionally comprises an at leastpartially purified DNA or other nucleic acid which putatively comprisesa specific sequence of interest. Also, in some such embodiments, theconcentrations of all reagents included are generally as known topersons skilled in the art and can be optimized for specific adaptationsaccording to standard protocols. In some such embodiments, theconcentration of the fluorescent, double stranded DNA binding dye isbetween approximately 0.1 to 10.0 μg/ml. In some embodiments, a kit isprovided. Some illustrative embodiments of kits include at least onepair of amplification primers. Some embodiments of kits may furthercomprise one, several, or all of the following additional ingredients; athermostable DNA Polymerase; a mix of deoxynucleoside triphosphateswhich is usually dA, dG, dC and dT, or dA, dG, dC and dU, and a buffer,and a fluorescent, double stranded DNA binding dye, which may be suitedto be used for HRM. Additionally or alternatively, detection of agenetic biomarker can include a method for determining the increasedlikelihood of a response to a targeted treatment of a cancer disease,comprising the steps of: a) isolating genomic DNA from a patient sample;b) amplifying at least one fragment of said DNA by means of PCR with aspecific pair of amplification primers; c) determining, whether saidamplified fragment has a wildtype sequence or comprises a mutation bymeans of a High Resolution Melting Analysis (HRM); and d) correlatingthe presence or absence of a mutation with an increased likelihood ofsuccess of said targeted treatment. In some embodiments, the mutation isidentified by means of a hybridization analysis or by means ofsequencing. For example, the patient sample may be Formalin FixedParaffin Embedded (FFPE) tissue. In some such cases, HRM Analysis may beperformed without any spiking of DNA.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,873,908, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods for enriching low abundance alleles (e.g. mutant DNA) in asample that allows subsequent detection of such alleles. Additionally oralternatively, detection of a genetic biomarker can include a method ofenriching a variant of a target nucleic acid in a mixture of nucleicacids from a sample, the target nucleic acid existing in the form of twovariant sequences, wherein said variants differ at a single nucleotideposition, the method comprising, providing the sample that includes thetarget nucleic acid wherein the variant to be enriched is present in thesample in low abundance amongst a large excess of the other variant;providing an oligonucleotide that is complementary to one strand of thetarget nucleic acid at a concentration that is in molar excess to thetarget nucleic acid, wherein the oligonucleotide is attached with anaffinity label and is perfectly matched at the single nucleotideposition with the variant to be enriched and has a mismatch at thesingle nucleotide position with the other variant; providing conditionssuitable for hybridization of the oligonucleotide to the target nucleicacid to generate duplex polynucleotides consisting of theoligonucleotide and one strand of either variant of the target nucleicacid; contacting the duplex polynucleotides with a mismatchintercalating compound that preferentially binds to only the duplexpolynucleotides that contain a mismatch wherein said compound is furthercapable of catalyzing cleavage of one strand of the duplexpolynucleotide at the mismatch site with light; subjecting the duplexpolynucleotides to light resulting in both cleaved and uncleaved duplexpolynucleotides; applying both cleaved and uncleaved duplexpolynucleotides to an affinity matrix that recognizes and binds to theaffinity label on the oligonucleotide; providing conditions whereby onlythe cleaved duplex polynucleotide is denatured and removing thedenatured single strand from the affinity matrix; and providing a bufferunder conditions to denature the uncleaved polynucleotide duplex; andcollecting the buffer which contains one strand of the enriched variantof the target nucleic acid. In one embodiment, the mismatchintercalating compound is Rh(bpy)2(chrysi)3+ or Rh(bpy)2(phzi)3+ ortheir respective analogs. Additionally or alternatively, detection of agenetic biomarker can include a further step of amplifying and detectingthe enriched variant of the target nucleic acid. Additionally oralternatively, detection of a genetic biomarker can include a method fordetecting a mutant allele of a target nucleic acid in a mixture ofnucleic acids from a sample wherein the mutant allele differs from awild-type allele at a single nucleotide position and is present in thesample in low abundance amongst a large excess of the wild-type allele,the method comprising enriching the mutant allele in the sample whereinthe enrichment is performed by providing an oligonucleotide that iscomplementary to one strand of the target nucleic acid at aconcentration that is in molar excess to the target nucleic acid,wherein the oligonucleotide is attached with an affinity label and isperfectly matched at the single nucleotide position with the mutantallele and has a mismatch at the single nucleotide position with thewild-type allele; providing conditions suitable for hybridization of theoligonucleotide to the target nucleic acid to generate duplexpolynucleotides consisting of the oligonucleotide and one strand ofeither the mutant allele or the wild-type allele; contacting the duplexpolynucleotides with a mismatch intercalating compound thatpreferentially binds to only the duplex polynucleotides that contain amismatch wherein said compound is further capable of catalyzing cleavageof one strand of the duplex polynucleotide at the mismatch site withlight; subjecting the duplex polynucleotides to light resulting in bothcleaved and uncleaved duplex polynucleotides; applying both cleaved anduncleaved duplex polynucleotides to an affinity matrix that recognizesand binds to the affinity label on the oligonucleotide; providingconditions whereby only the cleaved duplex polynucleotide is denaturedand removing the denatured single strand of the wild-type allele fromthe affinity matrix; providing a buffer under conditions to denature theuncleaved polynucleotide duplex; and collecting the buffer whichcontains one strand of the enriched mutant allele of the target nucleicacid; amplifying the enriched mutant allele; and detecting the productof the enriched amplified mutant allele or the signal generated from theenriched amplified mutant allele. In one embodiment, the mismatchintercalating compound is Rh(bpy)2(chrysi)3+ or Rh(bpy)2(phzi)3+ ortheir respective analogs.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,399,794, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of detecting the presence or absence of a target nucleic acidin a test sample comprising: inputting into a learning statisticalclassifier system data from a training set of samples where the amountof the target nucleic acid and a control nucleic acid is known, usingthe learning statistical classifier system, calculating a plurality ofweights for a general linear classifier; building a general linearclassifier with the plurality of weights calculated by the learningstatistical classifier system; contacting the test sample with areaction mixture containing reagents necessary to amplify the target andthe control nucleic acids by polymerase chain reaction (PCR) underconditions enabling PCR; measuring at least one amplification-dependentparameter for the target and the control nucleic acids to obtain a testset of data; applying the general linear classifier to the test set ofdata to classify the test sample as containing or not containing thetarget nucleic acid, thereby detecting the presence or absence of thetarget nucleic acid in the test sample. In variations of thisembodiment, the learning statistical classifier system is selected fromSVM, LDA and QDA. In further variations of this embodiment, theamplification-dependent parameter is fluorescence detected during eachcycle of amplification. In further variations of this embodiment, thedata is cycle-to-threshold (Ct) value. In further variations of thisembodiment, the general linear classifier is a piece-wise linearclassier. In further variations of this embodiment, a piece-wisefunction determined by constraints placed upon theamplification-dependent parameter for the control nucleic acid is inputinto the piece-wise linear classifier. In further variations of thisembodiment, the target nucleic acid is a nucleic acid variant of a humansequence. Additionally or alternatively, detection of a geneticbiomarker can include a method of detecting the presence or absence of atarget nucleic acid in a test sample comprising: inputting into alearning statistical classifier system data from a training set ofsamples where the amount of the target nucleic acid and a controlnucleic acid is known; using the learning statistical classifier system,calculating a plurality of weights for a general linear classifier;building a general linear classifier with the plurality of weightscalculated by the learning statistical classifier system; subjecting thesample to polymerase chain reaction (PCR); measuring at least oneamplification-dependent parameter for the target and the control nucleicacids to obtain a test set of data; applying the general linearclassifier to the test set of data; classifying the test sample ascontaining or not containing the target nucleic acid, thereby detectingthe presence or absence of the target nucleic acid in the test sample.In variations of this embodiment, the learning statistical classifiersystem is selected from SVM, LDA and QDA. In further variations of thisembodiment, the amplification-dependent parameter is fluorescencedetected during each cycles of amplification. In further variations ofthis embodiment, the data is cycle-to-threshold (Ct) value. In furthervariations of this embodiment, the general linear classifier is apiece-wise linear classifier. In further variations of this embodiment,a piece-wise function determined by constraints placed upon theamplification-dependent parameter for the control nucleic acid is inputinto the piece-wise linear classifier. In further variations of thisembodiment, the target nucleic acid is a nucleic acid variant of a humansequence. Additionally or alternatively, detection of a geneticbiomarker can include a method of determining whether a target nucleicacid is present in a test sample comprising: subjecting a training setof samples wherein the amount of the target nucleic acid and a controlnucleic acid is known to polymerase chain reaction (PCR) and measuringat least one amplification-dependent parameter for the target and thecontrol nucleic acids to obtain a training set of data; inputting thedata into a learning statistical classifier system; using the learningstatistical classifier system, calculating a plurality of weights for ageneral linear classifier; building a general linear classifier with theplurality of weights determined by the learning statistical classifiersystem; subjecting the test sample to PCR and measuring at least oneamplification-dependent parameter for the target and the control nucleicacids to obtain a test set of data; applying the general linearclassifier to the test set of data; classifying the test sample ascontaining or not containing the target nucleic acid. In variations ofthis embodiment, the learning statistical classifier system is selectedfrom SVM, LDA and QDA. In further variations of this embodiment, theamplification-dependent parameter is fluorescence detected during eachcycles of amplification. In further variations of this embodiment, thedata is cycle-to-threshold (Ct) value. In further variations of thisembodiment, the general linear classifier is a piece-wise linearclassifier. In further variations of this embodiment, a piece-wisefunction determined by constraints placed upon theamplification-dependent parameter for the control nucleic acid is inputinto the piece-wise linear classifier. In further variations of thisembodiment, the target nucleic acid is a nucleic acid variant of a humansequence. Additionally or alternatively, detection of a geneticbiomarker can include a method of determining whether a target nucleicacid is present in a test sample comprising: subjecting a training setof samples wherein the amount of the target nucleic acid and a controlnucleic acid is known to polymerase chain reaction (PCR) and measuringat least one amplification-dependent parameter for the target and thecontrol nucleic acids to obtain a training set of data; inputting thedata into a learning statistical classifier system; using the learningstatistical classifier system, calculating a plurality of weights for ageneral linear classifier; building a general linear classifier with theplurality of weights determined by the learning statistical classifiersystem; subjecting the test sample to PCR and measuring at least oneamplification-dependent parameter for the target and the control nucleicacids to obtain the test set of data; applying the general linearclassifier to the test set of data obtained; classifying the test sampleas containing or not containing the target nucleic acid. In variationsof this embodiment, the learning statistical classifier system isselected from SVM, LDA and QDA. In further variations of thisembodiment, the amplification-dependent parameter is fluorescencedetected during each cycles of amplification. In further variations ofthis embodiment, the data is cycle-to-threshold (Ct) value. In furthervariations of this embodiment, the general linear classifier is apiece-wise linear classifier. In further variations of this embodiment,a piece-wise function determined by constraints placed upon theamplification-dependent parameter for the control nucleic acid is inputinto the piece-wise linear classifier. In further variations of thisembodiment, the target nucleic acid is a nucleic acid variant of a humansequence. Additionally or alternatively, detection of a geneticbiomarker can include a computer readable medium including code forcontrolling one or more processors to classify whether a test samplecontains a target nucleic acid, the code including instructions to:apply a learning statistical classifier system to a training data setwhere the amount of the target nucleic acid and a control nucleic acidis known, in order to build a general linear classifier of Formula I;apply the general linear classifier to a testing data set comprising thedata from the test sample to produce a statistically derived decisionclassifying the test sample as containing or not containing the targetnucleic acid. In variations of this embodiment, the learning statisticalclassifier system is selected from SVM, LDA and QDA. In furthervariations of this embodiment, the data in the datasets iscycle-to-threshold (Ct) value. In further variations of this embodiment,the general linear classifier is a piece-wise linear classifier. Infurther variations of this embodiment, the target nucleic acid is anucleic acid variant of a human sequence. Additionally or alternatively,detection of a genetic biomarker can include a system for detecting atarget nucleic acid in a test sample comprising: a data acquisitionmodule configured to produce a data set from a training set of samplesand one or more test samples, the data set indicating presence andamount of the target nucleic acid and a control nucleic acid; a dataprocessing unit configured to process the data acquired by theacquisition module by applying a learning statistical classifier systemto the training data set in order to build a general linear classifierof Formula I, and then apply the general linear classifier of Formula Ito the test data set comprising the data from the test sample, toproduce a statistically derived decision classifying the test sample ascontaining or not containing the target nucleic acid; a display moduleconfigured to display the data produced by the data processing unit. Invariations of this embodiment, the learning statistical classifiersystem is selected from SVM, LDA and QDA. In further variations of thisembodiment, the general linear classifier is a piece-wise linearclassifier.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,382,581, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of allele-specific amplification of a variant of a targetsequence, the target existing in the form of several variant sequences,the method comprising, (a) hybridizing a first oligonucleotide and asecond oligonucleotide to at least one variant of the target sequence;wherein the first oligonucleotide is at least partially complementary toone or more variants of the target sequence, and the secondoligonucleotide is at least partially complementary to one or morevariants of the target sequence, and has at least one selectivenucleotide complementary to only one variant of the target sequence;wherein said second oligonucleotide comprises both a nucleotide with abase covalently modified at the exocyclic amino group and a modifiedphosphate having a structure:

wherein A and B represents a nucleotide chain, D is OH or CH3, and Accis an electron acceptor or an electron acceptor substituted with aresidue R wherein R is an organic substituent, wherein Acc is selectedfrom the group consisting of CN, SO2-R′, in which R′ comprises at leastone amino-substituted alkyl, an optionally substituted aryl or anoptionally substituted heterocycle, and a six membered N+heterocyclewith at least one alkylated N-atom in ortho- or para-position, saidheterocycle selected from the group consisting of pyridinium,pyrimidinium, and quinolinium; (b) providing conditions suitable foroligonucleotide extension by a nucleic acid polymerase; (c) extendingsaid first oligonucleotide and said second oligonucleotide by saidnucleic acid polymerase, wherein said nucleic acid polymerase is capableof extending said second oligonucleotide efficiently when saidoligonucleotide is hybridized to a variant of the target sequence whichis complementary to said at least one selective nucleotide, andsubstantially less efficiently when said second oligonucleotide ishybridized to a variant of the target sequence which is notcomplementary to said at least one selective nucleotide. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetecting a variant of a target sequence, the target existing in theform of several variant sequences, the method comprising, (a)hybridizing a first oligonucleotide and a second oligonucleotide to atleast one variant of the target sequence; wherein the firstoligonucleotide is at least partially complementary to one or morevariants of the target sequence, and the second oligonucleotide is atleast partially complementary to one or more variants of the targetsequence, and has at least one selective nucleotide complementary toonly one variant of the target sequence; wherein said secondoligonucleotide comprises both a nucleotide with a base covalentlymodified at the exocyclic amino group and a modified phosphate having astructure:

wherein A and B represents a nucleotide chain, D is OH or CH3, and Accis an electron acceptor or an electron acceptor substituted with aresidue R wherein R is an organic substituent, wherein Acc is selectedfrom the group consisting of CN, 802-R′, in which R′ comprises at leastone amino-substituted alkyl, an optionally substituted aryl or anoptionally substituted heterocycle, and a six membered N+ heterocyclewith at least one alkylated N-atom in ortho- or para-position, saidheterocycle selected from the group consisting of pyridinium,pyrimidinium, and quinolinium; (b) providing conditions suitable foroligonucleotide extension by a nucleic acid polymerase; (c) extendingsaid first oligonucleotide and said second oligonucleotide by saidnucleic acid polymerase, wherein said nucleic acid polymerase is capableof extending said second oligonucleotide efficiently when saidoligonucleotide is hybridized to a variant of the target sequence whichis complementary to said at least one selective nucleotide, andsubstantially less efficiently when said second oligonucleotide ishybridized to a variant of the target sequence which is notcomplementary to said at least one selective nucleotide; (d) detectingproducts of said oligonucleotide extension, wherein said extensionsignifies the presence of the variant of said target sequence to whichsaid second oligonucleotide has a complementary selective nucleotide.Additionally or alternatively, detection of a genetic biomarker caninclude an oligonucleotide for performing an allele-specificamplification of a target sequence, the target existing in the form ofseveral variant sequences, the oligonucleotide comprising, (a) asequence at least partially complementary to a portion of one or morevariants of said target sequence; (b) at least one selective nucleotidecomplementary to only one variant of the target sequence; (c) anucleotide with a base covalently modified at the exocyclic amino group;(d) a modified phosphate having a structure:

wherein A and B represents a nucleotide chain, D is OH or CH3, and Accis an electron acceptor or an electron acceptor substituted with aresidue R wherein R is an organic substituent, wherein Acc is selectedfrom the group consisting of CN, SO2-R′, in which R′ comprises at leastone amino-substituted alkyl, an optionally substituted aryl or anoptionally substituted heterocycle, and a six membered N+ heterocyclewith at least one alkylated N-atom in ortho- or para-position, saidheterocycle selected from the group consisting of pyridinium,pyrimidinium, and quinolinium. Additionally or alternatively, detectionof a genetic biomarker can include a reaction mixture forallele-specific amplification of a target sequence, the target existingin the form of several variant sequences, the mixture comprising, (a) afirst oligonucleotide, at least partially complementary to one or morevariant of the target sequence; and (b) a second oligonucleotide, atleast partially complementary to one or more variants of the targetsequence, and has at least one selective nucleotide complementary toonly one variant of the target sequence; wherein said secondoligonucleotide comprises both a nucleotide with a base covalentlymodified at the exocyclic amino group and a modified phosphate having astructure:

wherein A and B represents a nucleotide chain, D is OH or CH3, and Accis an electron acceptor or an electron acceptor substituted with aresidue R wherein R is an organic substituent, wherein Acc is selectedfrom the group consisting of CN, SO2-R′, in which R′ comprises at leastone amino-substituted alkyl, an optionally substituted aryl or anoptionally substituted heterocycle, and a six membered N+ heterocyclewith at least one alkylated N-atom in ortho- or para-position, saidheterocycle selected from the group consisting of pyridinium,pyrimidinium, and quinolinium; (c) a nucleic acid polymerase; (d)nucleoside triphosphates; and (e) a buffer suitable for the extension ofnucleic acids by the nucleic acid polymerase.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,279,146, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods and compositions for enriching low abundance alleles (e.g.mutant DNA) in a sample that allows subsequent detection of suchalleles. Additionally or alternatively, detection of a genetic biomarkercan include a method of enriching a variant of a target nucleic acidsequence in a mixture of nucleic acids from a sample, the target nucleicacid existing in the form of two variant sequences, wherein saidvariants differ at a single nucleotide position, the method comprising,providing the sample that includes the target nucleic acid sequencewherein the variant to be enriched is present in the sample in lowabundance amongst a large excess of the other variant; providing anoligonucleotide that is complementary to one strand of the targetnucleic acid sequence, wherein the oligonucleotide has a mismatch at thesingle nucleotide position with the variant to be enriched and isperfectly matched at the single nucleotide position with the othervariant; providing conditions suitable for hybridization of theoligonucleotide to the target nucleic acid to generate duplexpolynucleotides consisting of the oligonucleotide and one strand ofeither variant of the target nucleic acid sequence; contacting theduplex polynucleotides with a mismatch intercalating compound that isattached with an affinity label to generate a reaction mixture, whereinsaid mismatch intercalating compound is capable of binding to the duplexpolynucleotides that contain a mismatch and is not capable of binding tothe duplex polynucleotides that do not contain a mismatch; subjectingthe reaction mixture to an affinity matrix that recognizes and binds tothe affinity label on the mismatch intercalating compound; washing thereaction mixture and separating the affinity matrix from all materialthat is not bound to the affinity matrix; and providing a buffer toelute nucleic acid from the affinity matrix, and collecting the elutedbuffer which contains the enriched variant of the target nucleic acidsequence. Additionally or alternatively, detection of a geneticbiomarker can include a method for detecting a mutant allele of a targetnucleic acid sequence in a mixture of nucleic acids from a samplewherein the mutant allele differs from a wild-type allele at a singlenucleotide position and is present in the sample in low abundanceamongst a large excess of the wild-type allele, the method comprising,enriching the mutant allele in the sample wherein the enrichment isperformed by providing an oligonucleotide that is complementary to onestrand of the target nucleic acid sequence, wherein the oligonucleotidehas a mismatch at the single nucleotide position with the mutant alleleand is perfectly matched at the single nucleotide position with thewild-type allele; providing conditions suitable for hybridization of theoligonucleotide to the target nucleic acid to generate duplexpolynucleotides consisting of the oligonucleotide and one strand ofeither the mutant allele or the wild-type allele; contacting the duplexpolynucleotides with a mismatch intercalating compound that is attachedwith an affinity label to generate a reaction mixture, wherein themismatch intercalating compound is capable of binding to the duplexpolynucleotides that contain a mismatch and is not capable of binding tothe duplex polynucleotides that do not contain a mismatch; subjectingthe reaction mixture to a affinity matrix that recognizes and binds tothe affinity label on the mismatch intercalating compound; washing thereaction mixture and separating the affinity matrix from all materialthat is not bound to the affinity matrix; and providing a buffer toelute nucleic acid from the affinity matrix, and collecting the elutedbuffer which contains the enriched mutant allele; amplifying theenriched mutant allele; and detecting the product of the enrichedamplified mutant allele or the signal generated from the enrichedamplified mutant allele. Additionally or alternatively, detection of agenetic biomarker can include a method of enriching a variant of atarget nucleic acid sequence in a mixture of nucleic acids from asample, the target nucleic acid existing in the form of two variantsequences, wherein said variants differ at a single nucleotide position,the method comprising: providing the sample that includes the targetnucleic acid sequence wherein the variant to be enriched is present inthe sample in low abundance amongst a large excess of the other variant;heating the sample such that the mixture of nucleic acid is denatured;providing conditions suitable for the reannealing of the target nucleicacid, wherein duplex polynucleotides can be formed between one strand ofone variant sequence and one strand of the other variant sequence togenerate a mismatch at the single nucleotide position where the variantsdiffer; contacting the duplex polynucleotides with a mismatchintercalating compound that is attached with an affinity label togenerate a reaction mixture, wherein said mismatch intercalatingcompound is capable of binding to the duplex polynucleotides thatcontain a mismatch and is not capable of binding to the duplexpolynucleotides that do not contain a mismatch; subjecting the reactionmixture to an affinity matrix that recognizes and binds to the affinitylabel on the mismatch intercalating compound; washing the reactionmixture and separating the affinity matrix from all material that is notbound to the affinity matrix; and providing a buffer to elute nucleicacid from the affinity matrix, and collecting the eluted buffer whichcontains the enriched variant of the target nucleic acid sequence.Additionally or alternatively, detection of a genetic biomarker caninclude a method for detecting a mutant allele of a target nucleic acidsequence in a mixture of nucleic acids from a sample wherein the mutantallele differs from a wild-type allele at a single nucleotide positionand is present in the sample in low abundance amongst a large excess ofthe wild-type allele, the method comprising: enriching the mutant allelein the sample wherein the enrichment is performed by: heating the samplesuch that the mixture of nucleic acid is denatured; providing conditionssuitable for the reannealing of the target nucleic acid, wherein duplexpolynucleotides can be formed between one strand of the mutant alleleand one strand of the wild-type allele to generate a mismatch at thesingle nucleotide position where the alleles differ; contacting theduplex polynucleotides with a mismatch intercalating compound that isattached with an affinity label to generate a reaction mixture, whereinsaid mismatch intercalating compound is capable of binding to the duplexpolynucleotides that contain a mismatch and is not capable of binding tothe duplex polynucleotides that do not contain a mismatch; subjectingthe reaction mixture to an affinity matrix that recognizes and binds tothe affinity label on the mismatch intercalating compound; washing thereaction mixture and separating the affinity matrix from all materialthat is not bound to the affinity matrix; and providing a buffer toelute nucleic acid from the affinity matrix, and collecting the elutedbuffer which contains the enriched variant of the target nucleic acidsequence; amplifying the enriched mutant allele; and detecting theproduct of the enriched amplified mutant allele or the signal generatedfrom the enriched amplified mutant allele.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 9,238,832, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method of allele-specific amplification of a variant of a targetsequence, the target existing in the form of several variant sequences,the method comprising (a) hybridizing a first and a secondoligonucleotides to at least one variant of the target sequence; whereinthe first oligonucleotide is at least partially complementary to one ormore variants of the target sequence, and the second oligonucleotide isat least partially complementary to one or more variants of the targetsequence, and has at least one internal selective nucleotidecomplementary to only one variant of the target sequence; (b) extendingthe second oligonucleotide with a nucleic acid polymerase, wherein saidpolymerase is capable of extending said second oligonucleotidepreferentially when said selective nucleotide forms a base pair with thetarget, and substantially less when said selective nucleotide does notform a base pair with the target. Additionally or alternatively,detection of a genetic biomarker can include a method of detecting avariant of a target sequence, the target existing in the form of severalvariant sequences, the method comprising (a) hybridizing a first andsecond oligonucleotides to at least one variant of the target sequence;wherein said first oligonucleotide is at least partially complementaryto one or more variants of the target sequence and said secondoligonucleotide is at least partially complementary to one or morevariants of the target sequence, and has at least one internal selectivenucleotide complementary to only one variant of the target sequence; (b)extending the second oligonucleotide with a nucleic acid polymerase;wherein said polymerase is capable of extending said secondoligonucleotide preferentially when said selective nucleotide forms abase pair with the target, and substantially less when said selectivenucleotide does not form a base pair with the target; and (c) detectingthe products of said oligonucleotide extension, wherein the extensionsignifies the presence of the variant of a target sequence to which theoligonucleotide has a complementary selective nucleotide. Additionallyor alternatively, detection of a genetic biomarker can include anoligonucleotide for performing an allele-specific amplification of atarget sequence, said target existing in the form of several variantsequences, the oligonucleotide comprising (a) a sequence at leastpartially complementary to a portion of one or more variants of saidtarget sequence; (b) at least one internal selective nucleotidecomplementary to only one variant of the target sequence. Additionallyor alternatively, detection of a genetic biomarker can include areaction mixture for allele-specific amplification of a target sequence,said target existing in the form of several variant sequences, themixture comprising (a) a first oligonucleotide, at least partiallycomplementary to one or more variant of the target sequence; and (b) asecond oligonucleotide, at least partially complementary to one or morevariants of the target sequence but having at least one internalselective nucleotide complementary to only one variant of the targetsequence.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2016/0092630, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include accurate and fast mapping of sequencingreads obtained from a targeted sequencing. For example, once a targetregion is selected, alternate regions of the genome that aresufficiently similar to the target region can be identified. If asequencing read is more similar to the target region than to analternate region, then the read can be determined as aligning to thetarget region. The reads aligning to the target region can then beanalyzed to determine whether a mutation exists in the target region.Accordingly, a sequencing read can then be compared to the target regionand the corresponding alternate regions, and not to the entire genome,thereby providing computational efficiency. Additionally oralternatively, detection of a genetic biomarker can include a methoddetects variants in a target region of a sample genome of an organism. Aplurality of sequence reads are received. The sequence reads areobtained from sequencing genomic segments in a sample obtained from theorganism, where the sequencing includes targeting genomic segments fromthe target region. One or more alternate regions that have a respectivefirst number of variations from the target region of a reference genomeare identified. Each respective first number is greater than one andless than a first threshold number. A computer system performs analignment of the plurality of sequence reads to the target region of thereference genome to identify a set of sequence reads that align to thetarget region of the reference genome with less than a second thresholdnumber of variations. Sequence reads that align to one of the alternateregions with a second number of variations that is less than a thirdthreshold number can be removed from the set. The remaining sequencereads of the set are analyzed to determine variants in the target regionof the sample genome.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,977,108, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includeapproaches for rapidly and reliably detecting and differentiatingbetween mutant and non-mutant forms of nucleic acids that compriserepetitive nucleotide sequences. In certain embodiments, for example,the methods are used to assess microsatellite instability in patients aspart of diagnostic or prognostic applications. In many embodiments,various polymorphisms of a given repetitive nucleotide sequence aredetected using a single probe nucleic acid. Additionally oralternatively, detection of a genetic biomarker can include a method ofdetecting a mutant form of a target nucleic acid. The method includesproviding at least one target nucleic acid and/or an amplicon of thetarget nucleic acid. The target nucleic acid includes at least onerepetitive nucleotide sequence. The method also includes binding (e.g.,hybridizing, etc.) at least one probe nucleic acid to the target nucleicacid and/or to the amplicon of the target nucleic acid. The probenucleic acid includes at least a first nucleotide sequence that is atleast substantially complementary to at least a portion of a non-mutantform of the repetitive nucleotide sequence. In addition, the method alsoincludes detecting a bimodal dissociation of the probe nucleic acid fromthe target nucleic acid and/or from the amplicon of the target nucleicacid. In some embodiments, the detected bimodal dissociation comprises abimodal distribution of melting peaks. Further, the detected bimodaldissociation generally correlates with at least one mutant form of therepetitive nucleotide sequence. In some embodiments, a detectednon-bimodal (e.g., a single mode, etc.) dissociation correlates with anon-mutant form the repetitive nucleotide sequence. Typically, the probenucleic acid, the target nucleic acid, and/or the amplicon of the targetnucleic acid includes or is associated with at least one labeling moietyand/or at least one quencher moiety. In these embodiments, the detectingstep generally includes detecting a detectable signal produced by thelabeling moiety. Moreover, the bimodal dissociation of the probe nucleicacid from the target nucleic acid and/or from the amplicon of the targetnucleic acid is typically detected under at least one varied condition,such as a varied temperature or the like. Additionally or alternatively,detection of a genetic biomarker can include a reaction mixture. Thereaction mixture includes at least one target nucleic acid and/or anamplicon of the target nucleic acid. The target nucleic acid includes atleast one repetitive nucleotide sequence. The reaction mixture alsoincludes at least one probe nucleic acid that includes at least a firstnucleotide sequence that is at least substantially complementary to atleast a portion of a non-mutant form of the repetitive nucleotidesequence. Further, the probe nucleic acid dissociates bimodally from abound target nucleic acid that includes at least one mutant form of therepetitive nucleotide sequence under at least one varied condition. Incertain embodiments, the reaction mixture also includes various othercomponents. For example, the reaction mixture optionally includes atleast one salt (e.g., NaCl, KCl, and/or the like). In some embodiments,the reaction mixture also includes at least one buffer. The buffertypically maintains a pH of the reaction mixture between about 5.5 andabout 10.0. The reaction mixture also optionally includes at least onecofactor, such as Mg2+ (e.g., MgSO4, MgCl2, etc.), Mn2+ (e.g., MnSO4,MnCl2, etc.), and/or the like. Additionally or alternatively, detectionof a genetic biomarker can include a probe nucleic acid. The probenucleic acid includes at least a first nucleotide sequence that is atleast substantially complementary to at least a portion of a non-mutantform of a repetitive nucleotide sequence. In addition, the probe nucleicacid dissociates bimodally from a bound target nucleic acid thatincludes at least one mutant form of the repetitive nucleotide sequenceunder at least one varied condition. Additionally or alternatively,detection of a genetic biomarker can include a system for detectingmutant forms of target nucleic acids. The system includes at least oneprobe nucleic acid that includes at least a first nucleotide sequencethat is at least substantially complementary to a non-mutant form of arepetitive nucleotide sequence. The probe nucleic acid dissociatesbimodally from a bound target nucleic acid that comprises at least onemutant form of the repetitive nucleotide sequence under at least onevaried condition. Typically, at least one container comprises the probenucleic acid, e.g., in solution. The system also includes at least onedetector that detects dissociation of the probe nucleic acid from atarget nucleic acid and/or from an amplicon of the target nucleic acidwhen the probe nucleic acid is bound to the target nucleic acid and/orto the amplicon of the target nucleic acid and subjected to one or morevaried conditions. In some embodiments, the system also includes atleast one thermal modulator that modulates temperatures to which theprobe nucleic acid is exposed when the probe nucleic acid is bound tothe target nucleic acid and/or to the amplicon of the target nucleicacid to effect the varied conditions. In certain embodiments, the systemalso includes at least one controller operably connected at least to thedetector, which controller correlates detected bimodal dissociations ofthe probe nucleic acid from bound target nucleic acids and/or boundamplicons of target nucleic acids with diagnoses of at least one geneticdisorder and/or at least one disease state for subjects from which thetarget nucleic acids were obtained. In some embodiments, the targetnucleic acid typically comprises a DNA or an RNA, and is generallyobtained from at least one subject. Mutant forms of the target nucleicacid typically correlate with a diagnosis of at least one geneticdisorder (e.g., Fragile X Syndrome, etc.) and/or at least one diseasestate (e.g., at least one form of cancer, etc.) for a subject comprisingthe mutant form of the target nucleic acid. Further, the mutant form ofthe repetitive nucleotide sequence typically comprises at least onedeletion relative to the non-mutant form of the repetitive nucleotidesequence. In some embodiments, for example, the repetitive nucleotidesequence corresponds to a microsatellite marker, a mononucleotiderepeat, and/or the like. In some embodiments, the repetitive nucleotidesequence comprises at least one mononucleotide repeat (e.g., An, Tn, Gn,Cn, Un, etc., where n is an integer greater than 1). For example, themononucleotide repeat optionally comprises a BAT-25 repeat, a BAT-26repeat, among many others. In certain embodiments, detected mutant formsof the mononucleotide repeat comprise 22 or fewer adenine nucleotides.To further illustrate, the repetitive nucleotide sequence of the targetnucleic acid includes at least one AT repeat, at least one GC repeat, atleast one CGG repeat, at least one CGC repeat, at least one TAT repeat,at least one ATT repeat, and/or at least one complementary repeatthereof in certain embodiments. In some embodiments, for example, thefirst nucleotide sequence is longer than the non-mutant form of therepetitive nucleotide sequence. In these embodiments, the portion of thefirst nucleotide sequence that extends beyond the length of thenon-mutant form of the repetitive nucleotide sequence is typically notsubstantially complementary to nucleotide sequences of the targetnucleic acid that are adjacent to the repetitive nucleotide sequence.While not being constrained to a particular theory, in these embodimentsit is thought that at least one segment of the probe nucleic acid formsa triple helix when the probe nucleic acid is bound to the mutant formof the target nucleic acid or to an amplicon of the mutant form of thetarget nucleic acid under at least one selected condition. In certainembodiments, the probe nucleic acid comprises at least one modifiednucleotide. The probe nucleic acids, target nucleic acids, and/oramplicons of the target nucleic acids (e.g., via primer nucleic acidsused to produce the amplicons, etc.) optionally comprise or areassociated with at least one labeling moiety and/or at least onequencher moiety. To illustrate, the labeling moiety optionally comprisesone or more of, e.g., a fluorescent dye, a weakly fluorescent label, anon-fluorescent label, a colorimetric label, a chemiluminescent label, abioluminescent label, an antibody, an antigen, biotin, a hapten, anenzyme, or the like. To further exemplify, the fluorescent dye isoptionally selected from the group consisting of, e.g., Cy3, Cy3.5, Cy5,Cy5.5, JOE, VIC, TET, HEX, FAM, R6G, R110, TAMRA, ROX, SYBR-Green, EtBr,and the like.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 7,745,125, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includemethods relating to nucleic acid polymerization and amplification. Incertain embodiments, for example, the pyrophosphorolysis activatedpolymerization (PAP)-related methods involve the serial coupling ofpyrophosphorolysis and polymerization. These methods can be used, e.g.,for SNP analysis and rare somatic mutation detection, among many otherapplications. In some embodiments, the methods enhance the generalspecificity of oligonucleotide-mediated synthesis reactions. Forexample, analogous to other “hot start” methods (e.g., reversible,chemically-modified enzymes, aptamer- or antibody-mediated “hot start”),“zeroth cycle extension” (pre-PCR) is reduced or eliminated. Unlikethese other methods, primer activation is effected at each and every newoligonucleotide-mediated synthesis step. This improves the overallspecificity of the reaction, minimizing the generation of unwanted sideproducts. Accordingly, the detection of low copy and even single copysequences is improved. In addition, the performance in multiplex (whereseveral or many different target are being amplified) amplificationreactions is also improved by reducing or eliminating the generation ofunintended and undesired, non-specific synthesis products, e.g., primerdimers in the case of PCR. Additionally or alternatively, detection of agenetic biomarker can include a reaction mixture that includes at leastone oligonucleotide (e.g., a primer nucleic acid, a probe nucleic acid,etc.) comprising a 2′-terminator nucleotide (e.g., at a 3′-terminus). Incertain embodiments, the oligonucleotide comprises the formula:

where Z is O or CH2; B is at least one homocyclic ring, at least oneheterocyclic ring, at least one aryl group, or combinations thereof; BGis a blocking group; R1 is H, OH, a hydrophilic group, or a hydrophobicgroup; X is a nucleotide or a nucleotide analog; n is an integer greaterthan 0; and, represents a single or double bond. Optionally, theoligonucleotide comprises at least one label. In certain embodiments, atleast one nucleotide position in the oligonucleotide corresponds to apolymorphic nucleotide position in a target nucleic acid. In some ofthese embodiments, for example, the 2′-terminator nucleotide correspondsto the polymorphic nucleotide position in the target nucleic acid. Thereaction mixture typically includes additional reagents according to theparticular application in which the reaction mixture is utilized. Insome embodiments, for example, additional reagents are selected from,e.g., a first biocatalyst comprising a nucleotide removing activity(e.g., a pyrophosphorolysis activity and/or a nuclease activity), asecond biocatalyst comprising a nucleotide incorporating activity, atarget nucleic acid comprising at least a subsequence that is at leastpartially complementary to the oligonucleotide, an amplicon, a primernucleic acid, a probe nucleic acid (e.g., a hybridization probe, a5′-nuclease probe, a hairpin probe, etc.), an additional nucleotide(e.g., an extendible nucleotide, a terminator nucleotide, aribonucleoside triphosphate, a deoxyribonucleoside triphosphate, etc.),an additional oligonucleotide (e.g., a primer nucleic acid, a probenucleic acid, etc.), a soluble light emission modifier, a cosolvent, anintercalating agent, a clinical specimen, a sample, a buffer, a salt, ametal ion, pyrophosphate, glycerol, dimethyl sulfoxide, poly rA, and thelike. In some embodiments, the target nucleic acid, the amplicon, theprimer nucleic acid, the probe nucleic acid, the additional nucleotide,and/or the additional oligonucleotide comprises at least one label. Incertain embodiments, the buffer comprisesN-[Tris(hydroxymethyl)methyl]glycine at a concentration of at least 90mM (e.g., about 95 mM, about 100 mM, about 105 mM, etc.). In someembodiments, the first biocatalyst comprises a nucleotide incorporatingactivity (i.e., in addition to the nucleotide removing activity). Thenucleotide incorporating activity of the first and/or the secondbiocatalyst typically comprises a polymerase activity and/or a ligaseactivity. Optionally, the first and/or the second biocatalyst comprisesa nuclease activity. To further illustrate, the first and/or secondbiocatalyst optionally comprises an enzyme selected from, e.g., apolymerase, a terminal transferase, a reverse transcriptase, apolynucleotide phosphorylase, a ligase, an AP endonuclease, and atelomerase. In certain embodiments, the first and/or second biocatalystcomprises a CS5 DNA polymerase that includes one or more mutations atamino acid positions selected from the group consisting of: G46, L329,Q601, D640, 1669, 5671, and E678. In some of these embodiments, forexample, the mutations comprise a G46E mutation, an L329A mutation, aQ601R mutation, a D640G mutation, an I669F mutation, an S671F mutation,and/or an E678G mutation. In some embodiments, for example, the2′-terminator nucleotide comprises a 2′-monophosphate-3′-hydroxylnucleoside. Additionally or alternatively, detection of a geneticbiomarker can include a method of removing a nucleotide from anoligonucleotide. The method includes incubating at least one targetnucleic acid with: at least a first biocatalyst comprising a nucleotideremoving activity (e.g., a pyrophosphorolysis activity and/or a nucleaseactivity), and at least one oligonucleotide (e.g., a primer nucleic, aprobe nucleic acid, etc.) comprising a 2′-terminator nucleotide (e.g.,at a 3′-terminus), which oligonucleotide is at least partiallycomplementary to at least a first subsequence of the target nucleicacid, under conditions whereby the first biocatalyst removes at leastthe 2′-terminator nucleotide from the oligonucleotide to produce aremoved 2′-terminator nucleotide and a shortened oligonucleotide,thereby removing the nucleotide from the oligonucleotide. In someembodiments, the method includes incubating the target nucleic acid withthe first biocatalyst, the oligonucleotide, and pyrophosphate, whichpyrophosphate is added to the removed 2′-terminator nucleotide. In someexemplary embodiments, the target nucleic acid comprises at least onepolymorphic nucleotide position, and the method comprises detectingremoval of the 2′-terminator nucleotide from the oligonucleotide, whichremoval correlates with the oligonucleotide comprising at least onenucleotide position that corresponds to the polymorphic nucleotideposition. In these embodiments, the 2′-terminator nucleotide typicallycorresponds to the polymorphic nucleotide position. In certainembodiments, the oligonucleotide comprises at least one label, and themethod comprises detecting a detectable signal emitted from the label.In some of these embodiments, the label comprises a donor moiety and/oran acceptor moiety and the detectable signal comprises light emission,and the method comprises incubating the target nucleic acid with thefirst biocatalyst, the oligonucleotide, and at least one soluble lightemission modifier and detecting the light emission from the donor moietyand/or the acceptor moiety. Optionally, the 2′-terminator nucleotidecomprises the donor moiety and/or the acceptor moiety. In someembodiments, the first biocatalyst comprises a nucleotide incorporatingactivity (i.e., in addition to the nucleotide removing activity), andthe method comprises incubating the target nucleic acid with the firstbiocatalyst, the shortened oligonucleotide, and at least one additionalnucleotide under conditions whereby the first biocatalyst incorporatesthe additional nucleotide at a terminus of the shortened oligonucleotideto produce an extended oligonucleotide. Optionally, the method comprisesincubating the target nucleic acid with at least a second biocatalystcomprising a nucleotide incorporating activity, the shortenedoligonucleotide, and at least one additional nucleotide under conditionswhereby the second biocatalyst incorporates the additional nucleotide ata terminus of the shortened oligonucleotide to produce an extendedoligonucleotide. To illustrate, the nucleotide incorporating activitytypically includes a polymerase activity and/or a ligase activity. Thefirst and/or second biocatalyst typically comprises an enzyme selectedfrom, e.g., a polymerase, a terminal transferase, a reversetranscriptase, a polynucleotide phosphorylase, a ligase, an APendonuclease, a telomerase, and the like. In certain embodiments, thefirst and/or second biocatalyst comprises a CS5 DNA polymerasecomprising one or more mutations at amino acid positions selected from,e.g., G46, L329, Q601, D640, 1669, 5671, and E678. In some of theseembodiments, the mutations comprise a G46E mutation, an L329A mutation,a Q601R mutation, a D640G mutation, an I669F mutation, an S671Fmutation, and/or an E678G mutation. In some embodiments, one or morenucleotides of the oligonucleotide extend beyond a terminus of thetarget nucleic acid when the oligonucleotide and the target nucleic acidhybridize to form a hybridized nucleic acid. In some embodiments, atleast one additional oligonucleotide comprises the additionalnucleotide. The additional nucleotide comprises an extendible nucleotideand/or a terminator nucleotide. In certain embodiments, the additionalnucleotide comprises at least one label, and the method comprisesdetecting a detectable signal emitted from the label. For example, thelabel optionally comprises a donor moiety and/or an acceptor moiety andthe detectable signal comprises light emission, and the method comprisesincubating the target nucleic acid with at least one soluble lightemission modifier and detecting the light emission from the label.Additionally or alternatively, detection of a genetic biomarker caninclude incubating the target nucleic acid with at least one probenucleic acid that comprises at least one label, which probe nucleic acidis at least partially complementary to at least a second subsequence ofthe target nucleic acid, and detecting a detectable signal emitted fromthe label of the probe nucleic acid or a fragment thereof. In someembodiments, the detectable signal comprises light emission, and themethod further comprises incubating the target nucleic acid with atleast one soluble light emission modifier and detecting the lightemission from the label. For example, the probe nucleic acid optionallycomprises a 5′-nuclease probe and the first and/or second biocatalystextends the shortened oligonucleotide in a 5′ to 3′ direction andcomprises a 5′ to 3′ exonuclease activity. Optionally, the probe nucleicacid comprises a hybridization probe and/or a hairpin probe. In certainembodiments, the target nucleic acid comprises at least one polymorphicnucleotide position, and the method comprises detecting extension of theshortened oligonucleotide, which extension correlates with the extendedoligonucleotide comprising at least one nucleotide position thatcorresponds to the polymorphic nucleotide position. In some of theseembodiments, the 2′-terminator nucleotide corresponds to the polymorphicnucleotide position. Additionally or alternatively, detection of agenetic biomarker can include a system that includes (a) at least onecontainer or support comprising an oligonucleotide that comprises a2′-terminator nucleotide. The system also includes at least one of: (b)at least one thermal modulator configured to thermally communicate withthe container or the support to modulate temperature in the container oron the support; (c) at least one fluid transfer component that transfersfluid to and/or from the container or the support; and, (d) at least onedetector configured to detect detectable signals produced in thecontainer or on the support. In some embodiments, the system includes atleast one controller operably connected to: the thermal modulator toeffect modulation of the temperature in the container or on the support,the fluid transfer component to effect transfer of the fluid to and/orfrom the container or on the support, and/or the detector to effectdetection of the detectable signals produced in the container or on thesupport.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2018/0135103, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include methods based on digital polymerase chainreaction (dPCR) in combination with a reference sample which is used ina double function. First, it is added to a dPCR run as an externalstandard. Secondly, the same reference sample is used as an internalstandard, preferably by adding it to the primary sample. It runs throughthe whole sample preparation process in the same way as the nucleic acidof interest (target nucleic acid). Both the internal and the externalreference are quantified using dPCR. The ratio of internal vs externalreference quantification gives the yield of the sample preparation priorto the dPCR. Knowing this yield, the initial target concentration in theprimary sample can be calculated. The reference used with dPCR leads toa full understanding of standards used in dPCR and helps preventingmiscalculation due to pipetting and dilution errors. Even withnon-precise standards, the absolute accuracy of dPCR is further improvedand standards may be re-calibrated as a bonus. Additionally oralternatively, detection of a genetic biomarker can include a method fordetermining the amount or concentration of a nucleic acid of interest inan unprocessed sample, the method comprising the steps of: a) providingan unprocessed sample suspected of containing the nucleic acid ofinterest and a reference sample known to contain a reference nucleicacid, which is different from the nucleic acid of interest; b) combiningthe unprocessed sample with a defined amount of the reference sample,thereby obtaining a combined sample; c) processing the combined sample,thereby obtaining a processed sample suitable for digital polymerasechain reaction (dPCR); d) performing dPCR with the processed sample,thereby determining the amount or concentration of the nucleic acid ofinterest and the amount or concentration of the reference nucleic acidin the processed sample; e) performing the dPCR with a defined amount ofthe reference sample, thereby determining the amount or concentration ofthe reference nucleic acid in the defined amount of the referencesample; f) comparing the amount or concentration of the referencenucleic acid determined in step d) to that determined in step e),thereby determining the yield of the nucleic acid in step c); and g)determining the amount or concentration of the nucleic acid of interestin the unprocessed sample based on the amount or concentration of thenucleic acid of interest in the processed sample determined in step d)and the yield determined in step f). Additionally or alternatively,detection of a genetic biomarker can include a method for determiningthe amount or concentration of a nucleic acid of interest in anunprocessed sample, the method comprising the steps of: a) providing anunprocessed sample suspected of containing the nucleic acid of interest;b) providing a reference sample known to contain a reference nucleicacid, which is different from the nucleic acid of interest; c)processing the reference sample, thereby obtaining a processed referencesample suitable for dPCR; d) performing the dPCR with the processedreference sample, thereby determining the amount or concentration of thereference nucleic acid in the processed reference sample; e) performingthe dPCR with a defined amount of unprocessed reference sample, therebydetermining the amount or concentration of the reference nucleic acid inthe defined amount of the unprocessed reference sample; f) comparing theamount or concentration of the reference nucleic acid determined in stepd) to that determined in step e), thereby determining the yield of thenucleic acid in step c); g) processing the unprocessed sample, therebyobtaining a processed sample suitable for dPCR, wherein the processingsteps c) and g) are identical; h) performing the dPCR with the processedsample, thereby determining the amount or concentration of the nucleicacid of interest; and i) determining the amount or concentration of thenucleic acid of interest in the unprocessed sample based on the amountor concentration of the nucleic acid of interest in the processed sampledetermined in step i) and the yield determined in step f). In someembodiments, (i) the amount or concentration of the reference nucleicacid in the reference sample is compared to a reference value, therebycontrolling the reference sample; (ii) the amount or concentration ofthe reference nucleic acid in the reference sample is unknown or notpredetermined; and/or (iii) the amount or concentration of the referencesample in step e) is identical to that in step b). In addition, thereference nucleic acid has one or more of the following characteristics:(i) is a nucleic acid selected from the group consisting of DNA, cDNA,RNA and a mixture thereof; (ii) has the same primer binding site as thenucleic acid of interest; (iii) has a primer binding site different fromthat of the nucleic acid of interest; (iv) has a length in nucleic acidsthat differs from that of the nucleic acid of interest by at most 50%,at most 25%, at most 10% or at most 5%; (v) has a sequence that is atleast 50% identical, at least 60%, at least 70% or at least 80%identical to that of the nucleic acid of interest; (vi) has a content ofG and C that differs from that of the nucleic acid of interest by atmost 50%, at most 25%, at most 10% or at most 5%; and (vii) comprises apart that is not part of the nucleic acid of interest and that is usedfor detecting the reference nucleic acid. Moreover, the nucleic acid ofinterest has one or more of the following characteristics: (i) is anucleic acid selected from the group consisting of DNA, cDNA, RNA and amixture thereof; (ii) comprises a part that is not part of the referencenucleic acid and that is used for detecting the nucleic acid ofinterest; and (iii) is indicative of a microorganism, a cell, a virus, abacterium, a fungus, a mammal species, a genetic status or a disease.Still further, the unprocessed sample has one or more of the followingcharacteristics: (i) has been obtained from a cell culture, a sourcesuspected of being contaminated or a subject, particularly wherein thesubject is selected from the group consisting of a human, an animal anda plant, especially a human; and (ii) is selected from the groupconsisting of a body fluid, blood, blood plasma, blood serum, urine,bile, cerebrospinal fluid, a swab, a clinical specimen, an organ sampleand a tissue sample. The processing step can include one or more of thefollowing processes: dilution, lysis, centrifugation, extraction,precipitation, filtration, and purification. In some embodiments, dPCR,is characterized by one or more of the following: (i) is carried out ina liquid, in a gel, in an emulsion, in a droplet, in a microarray ofminiaturized chambers, in a chamber of a microfluidic device, in amicrowell plate, on a chip, in a capillary, on a nucleic acid bindingsurface or on a bead, especially in a microarray or on a chip; (ii) iscarried out identically in at least 100 reaction areas, particularly atleast 1,000 reaction areas, especially at least 5,000 reaction areas;and (iii) is carried out identically in at least 10,000 reaction areas,particularly at least 50,000 reaction areas, especially at least 100,000reaction areas. More specifically, steps d) and e) are carried out inthe same dPCR run and/or on the same dPCR device. In a furtherembodiment, dPCR, comprises using one or more fluorescent probes, aloneor in combination with a quencher, to detect the nucleic acid ofinterest and/or the reference nucleic acid. In a specific embodiment,the fluorescent probe comprises fluorescein, rhodamine, or cyanine. Inthis embodiment, the determining step comprises detecting a fluorescentsignal. In some embodiments, an external control is used. In a specificembodiment, the method is used to diagnose the presence or absence of adisease, a pathogen, a rare genetic sequence, a rare mutation, a copynumber variation or relative gene expression. Optionally, the method isused to monitor disease progression, therapeutic response, andcombinations thereof.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2014/0128270, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method of interrogating a sequence of atarget nucleic acid having a sense and an anti-sense strands by amicroarray analysis comprising a sequence determination computation,comprising omitting from the computation a signal from one of the senseand anti-sense strands for one or more nucleotide positions in thetarget nucleic acid sequence. In variations of this embodiment, omittingthe signal from one of the sense and anti-sense strands at a nucleotideposition comprises the steps of using a plurality of microarrays,measuring hybridization signals at the nucleotide position using one ormore probe sets for each of the sense and the anti-sense strands; foreach probe set, determining base discrimination ability by comparing thehybridization signals within each probe set; for each nucleotideposition, computing discrimination ability for sense and antisensestrand separately using the computed discrimination ability from each ofthe probe sets; for each nucleotide position, comparing the computeddiscrimination ability between the sense and the anti-sense strands;omitting the signal from the strand with lower base discriminationability. In variations of this embodiment, the base discrimination ismeasured using Formula 1. In further variations of this embodiment, thediscrimination ability for sense and antisense strand is computed as apercentile of the discrimination ability for probe sets in the strand atthe base position. In yet further variations of this embodiment, thediscrimination ability between sense and antisense strand is comparedusing Formula 3:

(1)W _(75i) <Q _(75j) −T, for Q _(75i) <PT

(2)Q _(75i) <A(Q _(75j) −B)² +PT, for Q _(75i) ≥PT  Formula 3

Additionally or alternatively, detection of a genetic biomarker caninclude a method of detecting the presence or absence of a targetnucleic acid having a sense and an anti-sense strands in a test sampleusing a microarray analysis including a sequence determination ormutation detection computation, comprising omitting from the computationa signal from one of the sense and anti-sense strands for one or morenucleotide positions in the target nucleic acid sequence. In variationsof this embodiment, omitting the signal from one of the sense andanti-sense strands at a nucleotide position comprises the steps of:using a plurality of microarrays, measuring hybridization signals at thenucleotide position using one or more probe sets for each of the senseand the anti-sense strands; for each probe set, determining basediscrimination by comparing the hybridization signals within each probeset; for each nucleotide position, computing discrimination ability forsense and antisense strand separately using discrimination ability fromeach of the probe sets; for each nucleotide position, comparingdiscrimination ability between the sense and the anti-sense strands;omitting the signal from a strand with lower base discriminationability. In variations of this embodiment, the base discrimination ismeasured using Formula 1. In further variations of this embodiment, thediscrimination ability of the sense and anti-sense strand is computed asa percentile of the discrimination ability for probe sets in the strandat the base position measured using a plurality of microarrays.Additionally or alternatively, detection of a genetic biomarker caninclude a computer readable medium including code for controlling one ormore processors to detect the presence or absence of a target nucleicacid having a sense and an anti-sense strands in a test sample using amicroarray analysis that includes a sequence determination or mutationdetection computation, comprising omitting from the computation a signalfrom one of the sense and anti-sense strands for one or more nucleotidepositions in the target nucleic acid sequence. In variations of thisembodiment, the computer readable medium comprises a code controllingthe steps of: using a plurality of microarrays, measuring hybridizationsignals at the nucleotide position using one or more probe sets for eachof the sense and the anti-sense strands; for each probe set, determiningbase discrimination ability by comparing the hybridization signalswithin each probe set; for each nucleotide position, computingdiscrimination ability for sense and antisense strand separately usingdiscrimination ability from each of the probe sets; for each nucleotideposition, comparing discrimination ability between the sense and theanti-sense strands; omitting the signal from a strand with lower basediscrimination ability. Additionally or alternatively, detection of agenetic biomarker can include a system for detecting a target nucleicacid in a test sample comprising: a data acquisition module configuredto acquire hybridization data from a microarray; a data processing unitconfigured to process the data to determine a target nucleotide sequenceby omitting the signal from one of the sense and anti-sense strands atone or more nucleotide positions in the target sequence via the stepsof: using a plurality of microarrays, measuring hybridization signals atthe nucleotide position using one or more probe sets for each of thesense and the anti-sense strands; for each probe set, determining basediscrimination ability by comparing the hybridization signals withineach probe set; for each nucleotide position, computing discriminationability for sense and antisense strand separately using discriminationability from each of the probe sets; for each nucleotide position,comparing discrimination ability between the sense and the anti-sensestrands; omitting the signal from a strand with lower basediscrimination ability; and a display module configured to display thedata produced by the data processing unit.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2002/0160404, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a method for the amplification of nucleicacid fragments from a sample comprising two or three thermocyclicamplification reactions in which completely randomized primers are usedin the first amplification reaction and specific primers are used in thesecond amplification reaction, characterized in that, to amplify theDNA, a mixture of at least two DNA polymerases is used, at least one ofwhich possesses 3′-5′ exonuclease activity. An amplification reactioncan comprise about 20 to 60 thermal cycles. The first amplificationreaction preferably comprises at least 40 thermal cycles and, mostpreferably, at least 50 thermal cycles. The second amplificationreaction preferably comprises at least 30 thermal cycles, and mostpreferably, at least 40 thermal cycles. Each thermal cycle comprises adenaturing phase, an annealing phase, and at least one elongation phase.Denaturation into single strands preferably takes place at temperaturesof between 90° C. and 96° C. The annealing phase to hybridize theprimers with the target nucleic acid preferably takes place attemperatures of between 30° C. and 50° C. Most preferably, the annealingphase takes place at temperatures of between 35° C. and 45° C. Duringthe first amplification reaction, the annealing phase most preferablytakes place at about 37° C. The elongation phase is carried out attemperatures of between 50° C. and 75° C. In a preferred embodiment, theelongation phase of the first amplification reaction takes place attemperatures of between 50° C. and 60° C. A temperature of about 55° C.is especially preferred. It is advantageous for the elongation to becarried out during the first amplification reaction in the majority ofcycles using two or more elongation steps, with one elongation carriedout at a lower temperature and then continuing the elongation at ahigher temperature. Using this approach, populations of especially longamplicons are created during the first amplification reaction. In thisembodiment, the first amplification reaction preferably takes place atabout 55° C., and the second amplification reaction takes place at about65° C. to 72° C. A temperature of about 68° C. is optimal. The primersused in the first amplification reaction are completely randomized,i.e., a population of single-stranded oligonucleotides is used in whichevery single nucleotide on every single position can comprise one offour nucleotide components A, T, G, or C. These primers are preferably10-20 nucleotides long. Most preferably, the primers are about 15nucleotides long. The specific primers used in the second amplificationreaction are characterized in that they have a sequence that isidentical to a sequence of the target nucleic acid or its complementarysequence over a range of at least 10 nucleotides. The specific primersused to carry out a “nested PCR” in a potential third amplificationreaction are selected according to the same criteria as the primers usedin the second amplification reaction. The sequences of the primers usedthat are identical to the target nucleic acid or its complement must bea component of the sequence amplified in the second amplificationreaction. The mixture of DNA polymerases preferably contains athermostable DNA polymerase without 3′-5′ exonuclease activity such asTaq DNA polymerase, for instance, and another thermostable DNApolymerase with 3′-5′ exonuclease activity, such as Pwo DNA polymeraseobtained from Pyrokokkus woesii (Boehringer Mannheim order no. 1644947).Other DNA polymerases without 3′-5′ exonuclease activity can also beused as a component of the polymerase mixture. Additionally oralternatively, detection of a genetic biomarker can include a method forDNA amplification. To ensure the sensitivity of detecting certainsequences, it is advantageous to carry out the cell analysis of thematerial to be analyzed using enzymatic protease digestion to obtain thesample DNA. Proteinase K can be used, for instance. Additionally oralternatively, detection of a genetic biomarker can include a method inwhich RNA is first isolated from the physical material to be analyzed.The sample of physical material can comprise one cell, fewer than 10cells, or fewer than 100 cells. To obtain RNA, it is preferable to usechemical lysis using buffers that contain guanidinum isothiocynate. Acorresponding cDNA is then created using a reverse transcriptasereaction. This cDNA is then used as the starting material for theprimer-extension preamplification. The cDNA is preferably obtained viareverse transcription of poly-A RNA. The use of polymerase mixtures inthe primer-extension preamplification PCR leads to a surprisingly highsensitivity of DNA detection that cannot be achieved using the methodsknown from the prior art. Additionally or alternatively, detection of agenetic biomarker can include a method for the amplification of nucleicacid fragments comprising two or three thermocyclic amplificationreactions. Completely randomized primers are used in the firstamplification reaction and specific primers are used in the secondamplification reaction. In addition, the sample contains a quantity ofnucleic acid corresponding to an equivalent of no more than 100 cells.In some embodiments, the likelihood of the amplificates forming isgreater than 90%. In some embodiments, the likelihood is greater than90% that amplificates will form from an equivalent of no more than 5-10cells. In a special embodiment, the likelihood of amplificates formingfrom the equivalent of one cell is greater than 50%. The method issuitable for use in the amplification of nucleic acid fragments having alength between 100 and 1000 base pairs. The method is especially suitedfor use in the amplification of nucleic acid fragments having a lengthbetween 150 and 550 base pairs.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,658,572, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea microarray with high density of oligopeptide features, therebyallowing for the detection of protein interactions across an organism'sproteome. An embodiment is a microarray comprising at least 50,000oligopeptide features per cm2. Another embodiment is a microarray havingoligopeptide features representing at least 50%, at least 60%, at least70%, at least 80%, at least 90%, at least 95%, at least 99%, or 100% ofthe proteome of a target selected from a virus or organism. Additionallyor alternatively, detection of a genetic biomarker can include amicroarray comprising at least 50,000 oligopeptide features per cm2wherein the features represent between about 90% and 100% of a targetproteome, the target selected from a virus and an organism, and whereinat least a portion of the features comprise oligopeptides having aterminal2-(2-nitro-4-benzoyl-phenyl)-propoxycarbonyl(benzoyl-NPPOC)-protectedtyrosine residue.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 8,822,158, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea method for treating multiple nucleic acid molecules of interestcomprising in the steps of: (a) providing a plurality of beads,characterized in that each bead comprises at least one pair of sequencespecific amplification primers, further characterized in that at leastone of said primers is bound to the bead via a photo-cleavable linker,(b) capturing the nucleic acid molecules of interest from a sample, (c)clonally isolating said plurality of beads, (d) photo-cleaving said atleast one primer, (e) clonally amplifying said nucleic acid therebycreating multiple amplification products, and (f) analyzing saidamplification products. In a first major embodiment, step c) comprisesthe generation of an emulsion wherein each bead is encapsulated in asingle micelle. Preferably, step f) comprises the distribution of saidplurality of beads into the cavities of a micro- or picotiter plate anddetecting said amplification products. In a first particular embodiment,step f) further comprises a sequencing reaction of said amplificationproducts. Preferably, said sequencing reaction is a sequencing bysynthesis reaction, for example a pyrosequencing reaction. In case themultiple nucleic acid molecules are variants of the same type of nucleicacid, such a method may be used for quantitative mutational analysis. Incase the plurality of molecules corresponds to a plurality of differentcellular RNAs or their corresponding cDNAs, such a method may be usedfor monitoring gene expression. In a second particular embodiment, thegeneration of said amplification products for example by means of PCR ismonitored. Preferably, said amplification products are detected by meansof a specifically double-stranded DNA binding fluorescent entity, asequence specific hybridization probe: Furthermore, said amplificationproducts may be analyzed by means of subjecting said amplificationproducts to a thermal gradient. In a second major embodiment, step c)comprises the distribution of said plurality of beads into the cavitiesof a micro- or picotiter plate. Preferably, steps e) and f) areperformed simultaneously by means of Real Time PCR. Subsequent to PCR, amelting curve analysis may be performed. In some embodiments, at leastone primer which is bound to the bead via a cleavable linker carries adetectable tag. Preferably, said detectable tag is selected from a groupconsisting of mass-tag, color label, e-tag and a hapten which isdetectable by an antibody. Highly preferred is a fluorescent label whichis preferably quenched as long as said labeled primer has not beenelongated. Alternatively; the cleavable primer carries a detectable tag.In this case, said amplification products are detected using labeledprimers or labeled dNTPs. For example, the detectable tag may be ahapten such as Biotin or Digoxygenin. In a particular embodiment, eachmember of the plurality of primers which are bound to the bead via acleavable linker carries a different detectable label.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Patent Application Publication No. 2015/0024948, which is herebyincorporated by reference in its entirety. For example, detection of agenetic biomarker can include a system and methods for the enrichmentand analysis of nucleic acid sequences. Additionally or alternatively,detection of a genetic biomarker can include the enrichment of targetedsequences in a format by representing one fusion partner gene on acapturing platform and allowing subsequent sequencing of chimericnucleic acids such as nucleic acid strands that carry information ondifferent DNA regions of a genome. Additionally or alternatively,detection of a genetic biomarker can include a method for detectingbalanced chromosomal aberrations in a genome is provided. The methodcomprises the steps of: (a) exposing fragmented, denatured nucleic acidmolecules of said genome to multiple, different oligonucleotide probeslocated on multiple, different sites of a solid support underhybridizing conditions to capture nucleic acid molecules thatspecifically hybridize to said probes, wherein said fragmented,denatured nucleic acid molecules have an average size of about 100 toabout 1000 nucleotide residues, preferably about 250 to about 800nucleotide residues and most preferably about 400 to about 600nucleotide residues, in particular about 500 nucleotide residues,wherein said oligonucleotide probes have an average size of about 20 toabout 100 nucleotides, preferably about 40 to about 85 nucleotides, morepreferred about 45 to about 75 nucleotides, in particular about 55 toabout 65 nucleotide residues or about 60 nucleotide residues, (b)separating unbound and non-specifically hybridized nucleic acids fromthe captured molecules; (c) eluting the captured molecules from thesolid support, (d) optionally repeating steps (a) to (c) for at leastone further cycle with the eluted captured molecules, (e) determiningthe nucleic acid sequence of the captured molecules, in particular bymeans of performing sequencing by synthesis reactions, (f) comparing thedetermined sequence to sequences in a database of the reference genome,(g) identifying sequences in the determined sequence which onlypartially match or do not match with sequences of the reference genome,(h) detecting at least one balanced chromosomal aberration. Additionallyor alternatively, detection of a genetic biomarker can includepre-selected, immobilized nucleic acid probes for capturing targetnucleic acid sequences from, for example, a genomic sample byhybridizing the sample to probes on a solid support is provided.According to some embodiments, the captured target nucleic acids may bewashed and eluted off of the probes. In some cases, the eluted genomicsequences may be more amenable to detailed genetic analysis than asample that has not been subjected to the methods. Additionally oralternatively, detection of a genetic biomarker can include the solutionbased capture method comprising probe derived amplicons wherein saidprobes for amplification are affixed to a solid support. The solidsupport comprises support-immobilized nucleic acid probes to capturespecific nucleic acid sequences from a genomic sample. Probeamplification provides probe amplicons in solution which are hybridizedto target sequences. Following hybridization of probe amplicons totarget sequences, target nucleic acid sequences present in the sampleare enriched by capturing and washing the probes and eluting thehybridized target nucleic acids from the captured probes. The targetnucleic acid sequence(s) may be further amplified using, for example,non-specific ligation-mediated PCR (LM-PCR), resulting in an amplifiedpool of PCR products of reduced complexity compared to the originaltarget sample which is further analysed by sequencing as describedabove. Additionally or alternatively, detection of a genetic biomarkercan include a method for detecting balanced chromosomal aberrations in agenome of an organism is provided. The method comprises the steps ofexposing fragmented, denatured nucleic acid molecules of the genome to aplurality of oligonucleotide probes bound to different positions of asolid support. The nucleic acid molecules have an average size of about100 to about 1000 nucleotide residues and the oligonucleotide probeshave an average size of about 20 to about 100 nucleotide residues. Themethod also includes the step of separating nucleic acid molecules boundto one or more of the oligonucleotide probes from nucleic acid moleculesnot bound to one or more of the oligonucleotide probes and then elutingthe nucleic acid molecules bound to one or more of the oligonucleotideprobes from the solid support. Thereafter, the nucleic acid moleculeswhich were eluted in the step of eluting are sequenced, thereby gettinga determined sequence for the nucleic acid molecules. Also, the methodincludes the step of comparing the determined sequence to a databasecomprising a reference genome sequence and identifying sequences in thedetermined sequence which only partially match or do not match withsequences of the reference genome, thereby detecting at least onebalanced chromosomal aberration. In some embodiments, theoligonucleotide probes include a linker for binding to the solidsupport. In various embodiments, the linker may comprise a chemicallinker. In some embodiments, the method may further include the steps ofligating at least one adaptor molecule to at least one end of thenucleic acid molecules prior to step exposing and amplifying the nucleicacid molecules which bound to one or more of the oligonucleotide probeswith at least one primer comprising a sequence which specificallyhybridizes to the adaptor molecule, whereby the step of amplifying iscarried out after the step of eluting. Further, according to someembodiments, the solid support is either a nucleic acid microarray or apopulation of beads. In some embodiments, the method of detectingbalanced chromosomal aberrations in a genome. The method includes thesteps of providing a solid support comprising a plurality of differentoligonucleotide probes bound to different positions of the solidsupport, wherein the oligonucleotide probes have an average size ofabout 20 to about 100 nucleotides, and providing a plurality offragmented and denatured nucleic acid molecules having an average sizeof about 100 to about 1000 nucleotide residues. The method also includesthe step of amplifying the oligonucleotide probes, thereby generatingamplification products including a binding moiety and being maintainedin solution. Thereafter, the method includes the steps of hybridizingthe target nucleic acid molecules to the amplification products insolution under specific hybridizing conditions, thereby generating aplurality of hybridization complexes, and separating the hybridizationcomplexes from nucleic acid molecules not hybridized to theamplification products. Next, according to the method, the hybridizedtarget nucleic acid molecules are separated from the amplificationproduct comprising the hybridization complex and sequenced, whereby adetermined sequence for the nucleic acid molecules is obtained.According to the method, the determined sequence is compared to adatabase comprising a reference genome and sequences in the determinedsequence which only partially match or do not match with sequences ofthe reference genome are determined in order for detecting at least onebalanced chromosomal aberration. In some embodiments, the binding moietyis a biotin moiety. According to some embodiments, oligonucleotideprobes having highly repetitive sequences are not used. Further, in someembodiments, the balanced chromosomal aberrations identified may includetranslocations or inversions.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin U.S. Pat. No. 6,514,736, which is hereby incorporated by reference inits entirety. For example, detection of a genetic biomarker can includea process for amplifying one or more specific nucleic acid sequencespresent in a nucleic acid or mixture thereof using primers and athermostable enzyme. The extension product of one primer when hybridizedto the other becomes a template for the production of the desiredspecific nucleic acid sequence, and vice versa, and the process isrepeated as often as is necessary to produce the desired amount of thesequence. The method improves the specificity of the amplificationreaction, resulting in a very distinct signal of amplified nucleic acid.In addition, the method eliminates the need for transferring reagentsfrom one vessel to another after each amplification cycle. Suchtransferring is not required because the thermostable enzyme willwithstand the high temperatures required to denature the nucleic acidstrands and therefore does not need replacement. The temperature cyclingmay, in addition, be automated for further reduction in manpower andsteps required to effectuate the amplification reaction. Additionally oralternatively, detection of a genetic biomarker can include a processfor amplifying at least one specific nucleic acid sequence contained ina nucleic acid or a mixture of nucleic acids, wherein if the nucleicacid is double-stranded, it consists of two separated complementarystrands of equal or unequal length, which process comprises: (a)contacting each nucleic acid strand with four different nucleosidetriphosphates and one oligonucleotide primer for each different specificsequence being amplified, wherein each primer is selected to besubstantially complementary to different strands of each specificsequence, such that the extension product synthesized from one primer,when it is separated from its complement, can serve as a template forsynthesis of the extension product of the other primer, said contactingbeing at a temperature which promotes hybridization of each primer toits complementary nucleic acid strand; (b) contacting each nucleic acidstrand, at the same time as or after step (a), with a thermostableenzyme which enables combination of the; nucleoside triphosphates toform primer extension products complementary to each strand of eachnucleic acid; (c) maintaining the mixture from step (b) at an effectivetemperature for an effective time to activate the enzyme, and tosynthesize, for each different sequence being amplified, an extensionproduct of each primer which is complementary to each nucleic acidstrand template, but not so high (a temperature) as to separate eachextension product from its complementary strand template; (d) heatingthe mixture from step (c) for an effective time and at an effectivetemperature to separate the primer extension products from the templateson which they were synthesized to produce single-stranded molecules, butnot so high (a temperature) as to denature irreversibly the enzyme; (e)cooling the mixture from step (d) at an effective temperature for aneffective time to promote hybridization of each primer to each of thesingle-stranded molecules produced in step (d); and (f) maintaining themixture from step (e) at an effective temperature for an effective timeto promote the activity of the enzyme and to synthesize, for eachdifferent sequence being amplified, an extension product of each primerwhich is complementary to each nucleic acid strand template produced instep (d), but not so high (a temperature) as to separate each extensionproduct from its complementary strand template, wherein steps (e) and(f) are carried out simultaneously or sequentially. The steps (d), (e)and (f) may be repeated until the desired level of sequenceamplification is obtained. The preferred thermostable enzyme is apolymerase extracted from Thermus aquaticus (Taq polymerase). Mostpreferably, if the enzyme is Taq polymerase, in step (a) the nucleicacid strands are contacted with a buffer comprising about 1.5-2 mM of amagnesium salt, 150-200 μM each of the nucleotides, and 1 μM of eachprimer, steps (a), (e) and (f) are carried out at about 45-58° C., andstep (d) is carried out at about 90-100° C. In a preferred embodiment,the nucleic acid(s) are double-stranded and step (a) is accomplished by(i) heating each nucleic acid in the presence of four differentnucleoside triphosphates and one oligonucleotide primer for eachdifferent specific sequence being amplified, for an effective time andat an effective temperature to denature each nucleic acid, wherein eachprimer is selected to be substantially complementary to differentstrands of each specific sequence, such that the extension productsynthesized from one primer, when it is separated from its complement,can serve as a template for synthesis of the extension product of theother primer; and (ii) cooling the denatured nucleic acids to atemperature which promotes hybridization of each primer to itscomplementary nucleic acid strand. Additionally or alternatively,detection of a genetic biomarker can include a process for detecting thepresence or absence of at least one specific nucleic acid sequence in asample containing a nucleic acid or mixture of nucleic acids, ordistinguishing between two different sequences in said sample, whereinthe sample is suspected of containing said sequence or sequences, andwherein if the nucleic acid(s) are double-stranded, they each consist oftwo separated complementary strands of equal or unequal length, whichprocess comprises steps (a) to (f) mentioned above, resulting inamplification in quantity of the specific nucleic acid sequence(s), ifpresent; (g) adding to the product of step (f) a labeled oligonucleotideprobe, for each sequence being detected, capable of hybridizing to saidsequence or to a mutation thereof; and (h) determining whether saidhybridization has occurred. Additionally or alternatively, detection ofa genetic biomarker can include a process for detecting the presence orabsence of at least one nucleotide variation in sequence in one or morenucleic acids contained in a sample, wherein if the nucleic acid isdouble-stranded it consists of two separated complementary strands ofequal or unequal length, which process comprises steps (a)-(f) mentionedabove, wherein steps (d), (e) and (f) are repeated a sufficient numberof times to result in detectable amplification of the nucleic acidcontaining the sequence, if present; (g) affixing the product of step(f) to a membrane; (h) treating the membrane under hybridizationconditions with a labeled sequence-specific oligonucleotide probecapable of hybridizing with the amplified nucleic acid sequence only ifa sequence of the probe is complementary to a region of the amplifiedsequence; and (i) detecting whether the probe has hybridized to anamplified sequence in the nucleic acid sample. If the sample comprisescells, preferably they are heated before step (a) to expose the nucleicacids therein to the reagents. This step avoids extraction of thenucleic acids prior to reagent addition. In a variation of this process,the primer(s) and/or nucleoside triphosphates are labeled so that theresulting amplified sequence is labeled. The labeled primer(s) and/ornucleoside triphosphate(s) can be present in the reaction mixtureinitially or added during a later cycle. The sequence-specificoligonucleotide (unlabeled) is affixed to a membrane and treated underhybridization conditions with the labeled amplification product so thathybridization will occur only if the membrane-bound sequence is presentin the amplification product. Additionally or alternatively, detectionof a genetic biomarker can include a process for cloning into a cloningvector one or more specific nucleic acid sequences contained in anucleic acid or a mixture of nucleic acids, which nucleic acid(s) whendouble-stranded consist of two separated complementary strands, andwhich nucleic acid(s) are amplified in quantity before cloning, whichprocess comprises steps (a)-(f) mentioned above, with steps (d), (e) and(f) being repeated a sufficient number of times to result in detectableamplification of the nucleic acid(s) containing the sequence(s); (g)adding to the product of step (f) a restriction enzyme for each of saidrestriction sites to obtain cleaved products in a restriction digest;and (h) ligating the cleaved product(s) of step (g) containing thespecific sequencels) to be cloned into one or more cloning vectorscontaining a promoter and a selectable marker. Additionally oralternatively, detection of a genetic biomarker can include a processfor cloning into a cloning vector one or more specific nucleic acidsequences contained in a nucleic acid or mixture of nucleic acids, whichnucleic acid(s), when double-stranded, consist of two separatedcomplementary strands of equal or unequal length which nucleic acid(s)are amplified in quantity before cloning, which process comprises steps(a)-(f) mentioned above, with steps (d), (e) and (f) being repeated asufficient number of times to result in effective amplification of thenucleic acid(s) containing the sequence(s) for blunt-end ligation intoone or more cloning vectors; and (g) ligating the amplified specificsequence(s) to be cloned obtained from step (f) into one or more of saidcloning vectors in the presence of a ligase, said amplified sequence(s)and vector(s) being present in sufficient amounts to effect theligation. Additionally or alternatively, detection of a geneticbiomarker can include a composition of matter useful in amplifying atleast one specific nucleic acid sequence contained in a nucleic acid ora mixture of nucleic acids, comprising four different nucleosidetriphosphates and one oligonucleotide primer for each different specificsequence being amplified, wherein each primer is selected to besubstantially complementary to different strands of each specificsequence, such that the extension product synthesized from one primer,when it is separated from its complement, can serve as a template forsynthesis of the extension product of the other primer. Additionally oralternatively, detection of a genetic biomarker can include a sample ofone or more nucleic acids comprising multiple strands of a specificnucleic acid sequence contained in the nucleic acid(s). The sample maycomprise about 10-100 of the strands, about 100-1000 of the strands, orover about 1000 of the strands. Additionally or alternatively, detectionof a genetic biomarker can include an amplified nucleic acid sequencefrom a nucleic acid or mixture of nucleic acids comprising multiplecopies of the sequence produced by the amplification above processes.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/181134, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include techniques for identifying driver genes,mutations, and/or pathways for various types of cancer. For example, theidentified driver genes may be used for diagnosis by identifyingmutations occurring on the identified driver genes, or for treatment bytargeting the identified driver genes. In some embodiments, a drivergene may be identified by determining a gene-specific backgroundmutation rate. In some embodiments, a statistical model forgene-specific background mutation rate may be determined by optimizingparameters estimated from single-gene and cross-genes modeling. In oneexample, the gene-specific background mutation can be statisticallydetermined by recursively optimizing a gene-specific mean and agene-specific dispersion using negative binomial regression and Bayesianinference. Genes, mutations, and/or pathways that have significantlymore mutations than the expected background mutations across samples maybe identified as candidate driver genes, mutations, and/or pathways.Additionally or alternatively, detection of a genetic biomarker caninclude a method comprising: for each sample of a plurality of samplesfrom different subjects having a same type of cancer, receiving a set ofone or more mutations in DNA measured in the sample, the DNA including aplurality of genes; for each sample of the plurality of samples,determining a sample mutation rate based on a total number of mutationsmeasured in the sample; for each mutation context of a plurality ofmutation contexts, determining a context mutation rate based on a firstnumber of mutations identified in the sets of mutations for the mutationcontext, wherein a mutation context corresponds to a type ofsubstitution or deletion; for each gene of the plurality of genes,determining, for each sample of the plurality of samples, a secondnumber of silent mutations measured in the gene in the sample;determining an expected silent mutation rate using a sum of contextmutation rates of silent mutations in the gene, wherein a silentmutation does not cause a change to an amino acid sequence of atranslated protein for the gene; determining a probability distributionof gene-specific background mutation rate across the plurality ofsamples for the gene based on the expected silent mutation rate for thegene and the sample mutation rates of the plurality of samples, whereindetermining the probability distribution of gene-specific backgroundmutation rate for the gene includes: optimizing one or more parametersof the probability distribution of gene-specific background mutationrate for the gene to increase a fit of the probability distribution tothe second number of silent mutations; determining an expectednon-silent mutation rate using a sum of context mutation rates of asubset of non-silent mutations in the gene; determining an expectednumber of samples having at least one non-silent mutation using theexpected non-silent mutation rate and the probability distribution ofthe gene-specific background mutation rate for the gene; and comparingthe expected number to the measured number of samples having at leastone non-silent mutation to obtain a likelihood value for the measurednumber; and identifying a group of genes having likelihood values abovea threshold as candidate driver genes.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/201315, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include an automated nucleic acid amplification methodwhich may comprise the following steps: (a) providing at least twodroplets, wherein each droplet comprises primers that anneal to a targetnucleic acid; (b) amplifying the target nucleic acid in each saiddroplet in parallel; (c) quantitating the amplified target nucleic acidin at least one droplet; and (d) after a desired amount of the targetnucleic acid has been obtained, recovering at least one droplet forfurther analyzing or processing of said at least one droplet.Additionally or alternatively, detection of a genetic biomarker caninclude an automated nucleic acid amplification method which maycomprise the following steps: (a) providing at least two droplets,wherein each droplet comprises a target nucleic acid; (b) amplifying thetarget nucleic acid in each said droplet in parallel; (c) quantitatingthe amplified target nucleic acid in at least one droplet; and (d) aftera desired amount of the target nucleic acid has been obtained,recovering at least one droplet for further analyzing or processing ofsaid at least one droplet. In an embodiment, said droplets may beprovided on an electrowetting-based device. Additionally oralternatively, detection of a genetic biomarker can include an automatednucleic acid amplification method which comprises the following steps:(a) providing an electrowetting-based device; (b) providing at least twodroplets on said electrowetting-based device, wherein each dropletcomprises primers that anneal to a target nucleic acid; (c) amplifyingthe target nucleic acid in each said droplet in parallel; (d)quantitating the amplified target nucleic acid in at least one droplet;and (e) after a desired amount of the target nucleic acid has beenobtained, recovering at least one droplet using for further analyzing orprocessing of said at least one droplet. Additionally or alternatively,detection of a genetic biomarker can include an automated nucleic acidamplification method which comprises the following steps: (a) providingan electrowetting-based device; (b) providing at least two droplets onsaid electrowetting-based device, wherein each droplet comprises atarget nucleic acid; (c) amplifying the target nucleic acid in each saiddroplet in parallel; (d) quantitating the amplified target nucleic acidin at least one droplet; and (e) after a desired amount of the targetnucleic acid has been obtained, recovering at least one droplet usingfor further analyzing or processing of said at least one droplet. Insome embodiments, said droplets may each may comprise a different targetnucleic acid. Additionally or alternatively, detection of a geneticbiomarker can include a method wherein said droplets each may comprisethe same target nucleic acid. Additionally or alternatively, detectionof a genetic biomarker can include to a method wherein said droplet maycomprise a mixture of droplets that contain the same target nucleic acidand different target nucleic acids. Additionally or alternatively,detection of a genetic biomarker can include a method wherein theelectrowetting-based device may comprise a biplanar configuration ofparallel arrays of electrodes to effect electrowetting-mediated dropletmanipulations. Some embodiments relate to a method wherein theelectrowetting-based device may comprise a planar configuration ofelectrodes that effects electrowetting-mediated droplet manipulations.Some embodiments relate to a method wherein the electrowetting-baseddevice may comprise square electrodes, optionally wherein saidelectrodes are about 5 mm by 5 mm. Some embodiments relate to a methodwherein the electrowetting-based device may comprise electrodes, whereinsaid electrodes are square, triangular, rectangular, circular,trapezoidal, and/or irregularly shaped. Some embodiments relate to amethod wherein the electrowetting-based device may comprise electrodeswherein said electrodes may comprise electrode dimensions ranging fromabout 100μηι by 100μηι to about 10 cm by 10 cm. Some embodiments relateto a method wherein the electrowetting-based device may compriseinterdigitated electrodes. Some embodiments relate to a method whereinthe electrowetting-based device may comprise electrodes, wherein saidelectrodes may comprise indium tin oxide (“ITO”), transparent conductiveoxides (“TCOs”), conductive polymers, carbon nanotubes (“CNT”),graphene, nanowire meshes and/or ultra thin metal films, e.g., ITO.Additionally or alternatively, detection of a genetic biomarker caninclude a method wherein the detection zone may detect electrochemicaland/or fluorescent signals. Additionally or alternatively, detection ofa genetic biomarker can include a method wherein said detection zone maydetect capacitance of a droplet. An additional embodiment pertains to amethod wherein said detection zone may be a fixed location. Anotherembodiment relates to a method wherein said detection zone may compriseany location within the electrowetting-based device. Yet anotherembodiment generally pertains to a method wherein the method ofamplification may comprise hot start PCR. Additionally or alternatively,detection of a genetic biomarker can include a method wherein saidamplification may comprise isothermal amplification. Additionally oralternatively, detection of a genetic biomarker can include a methodwherein said amplification may comprise thermocycling. Saidthermocycling may comprise temperatures ranging from about 50° C. toabout 98° C., e.g., about 50° C., about 60° C., about 65° C., about 72°C., about 95° C., or about 98° C. Said thermocycling may comprise timesranging from about 1 s to about 5 min., e.g., about 1 sec, about 5 sec,about 10 sec, about 20 sec, about 30 sec, about 45 sec, about 1 min,and/or about 5 min. Furthermore, said thermocycling may comprise threethermocycle steps, and said three thermocycle steps may be completed inone minute or less. In some embodiments, each droplet may furthercomprise a detection agent. Additionally or alternatively, detection ofa genetic biomarker can include a method wherein each droplet maycontain the same detection agent. Additionally or alternatively,detection of a genetic biomarker can include a method wherein eachdroplet may contain a different detection agent. Another embodimentgenerally relates to a method wherein the droplets may comprise alabeled subset of droplets wherein each droplet within the subsetcontains an agent for detecting the target nucleic acid, and anunlabeled subset of droplets wherein each droplet within the subset doesnot contain said agent for detecting the target nucleic acid. Anadditional embodiment generally encompasses a method wherein eachdroplet within the subset containing a detection agent may comprise adifferent detection agent. Additionally or alternatively, detection of agenetic biomarker can include a method wherein each droplet within thesubset containing a detection agent may comprise the same detectionagent. In some embodiments, the nucleic acid polymerase may be amodified naturally occurring Type A polymerase. A further embodimentgenerally relates to a method wherein the modified Type A polymerase maybe selected from any species of the genus Meiothermus, Thermotoga, orThermomicrobium. Another embodiment generally pertains to a methodwherein the polymerase may be isolated from any of Thermus aquaticus(Taq), Thermus thermophilus, Thermus caldophilus, or Thermus flliformis.A further embodiment generally encompasses a method wherein the modifiedType A polymerase may be isolated from Bacillus stearothermophilus,Sphaerobacter thermophilus, Dictoglomus thermophilum, or Escherichiacoli. Additionally or alternatively, detection of a genetic biomarkercan include a method wherein the modified Type A polymerase may be amutant 7a-E507K polymerase. Another embodiment generally pertains to amethod wherein a thermostable polymerase may be used to effectamplification of the target nucleic acid. A further embodiment generallyrelates to a method wherein the thermostable polymerase may be selectedfrom the following: Thermotoga maritima, Thermus aquaticus, Thermusthermophilus, Thermus flavus, Thermus flliformis, Thermus species Sps 17. Thermus species Z05, Thermus caldophilus, Bacillus caldotenax,Thermotoga neopolitana, and Thermosipho africanus. Additionally oralternatively, detection of a genetic biomarker can include a methodwherein a modified polymerase may be used to effect amplification of thetarget nucleic acid, e.g., wherein said modified polymerase may beselected from the following: G46E E678G CS5 DNA polymerase, G46E L329AE678G CS5 DNA polymerase, G46E L329A D640G S671F CS5 DNA polymerase,G46E L329A D640G S671F E678G CS5 DNA polymerase, a G46E E678G CS6 DNApolymerase, Z05 DNA polymerase, ΔZ05 polymerase, AZ05-Gold polymerase,AZ05R polymerase, E615G Taq DNA polymerase, E678G TMA-25 polymerase, andE678G TMA-30 polymerase. Additionally or alternatively, detection of agenetic biomarker can include a method wherein detection of thedetection agent may occur at the end of an amplification cycle. In someembodiments, the method may detect a single nucleotide polymorphism. Anadditional embodiment generally relates to a method wherein the methodmay be used for amplicon generation. In some embodiments, the method maybe used for a melting curve analysis. In some embodiments, the methodmay be used for target nucleic acid enrichment. In some embodiments, themethod may be used for primer extension target enrichment (“PETE”). Insome embodiments, the method may be used to for library amplification Insome embodiments, the method may be used quantitate the number ofadapter-ligated target nucleic acid molecules during librarypreparation. Additionally or alternatively, detection of a geneticbiomarker can include a method wherein said quantitation of the numberof adapter-ligated target nucleic acid molecules may occur (a) afteradapter ligation to determine the amount of input material converted toadapter-ligated molecules (conversion rate) and/or the quantity oftemplate used for library amplification; (b) after libraryamplification, to determine whether a sufficient amount of each libraryhas been generated and/or to ensure equal representation of indexedlibraries pooled for target capture or cluster amplification; and/or (c)prior to cluster amplification, to confirm that individual libraries orsample pools are diluted to the optimal concentration for NGS flow cellloading. Additionally, said quantitation of the number ofadapter-ligated target nucleic acid molecules may occur afterpost-ligation cleanup steps (prior to library amplification). In someembodiments, after recovering at least one droplet, said furtheranalyzing or processing of said at least one droplet may comprise anucleic acid sequencing reaction, a next generation sequencing reaction,whole-genome shotgun sequencing, whole exome or targeted sequencing,amplicon sequencing, mate pair sequencing, RIP-seq/CLIP-seq, ChlP-seq,RNA-seq, transcriptome analysis, and/or methyl-seq. Additionally oralternatively, detection of a genetic biomarker can include a methodwherein the droplets may be surrounded by a filler fluid, e.g., whereinsaid filler fluid may be an oil. In some embodiments, said oil maycomprise a transparent oil. In some embodiments, said oil may compriseliquid polymerized siloxane, silicone oil mineral oil, and/or paraffinoil. Another embodiment generally relates to method wherein the dropletsmay be surrounded by a gas, e.g., wherein said gas may be air. Yetanother embodiment generally relates to a method wherein the method maybe used to avoid overamplification bias. Another embodiment generallyrelates to a method wherein the method may be used to produce arepresentative sample of a population of mutations. Additionally oralternatively, detection of a genetic biomarker can include a methodwherein the method may be used to determine the number of amplificationcycles necessary to generate the desired concentration of a targetnucleic acid. Additionally or alternatively, detection of a geneticbiomarker can include a method wherein the method may be controlledthrough a computer in communication with the electrowetting-baseddevice. Some embodiments generally relate to a method wherein saidmethod comprises a master mix. In some embodiments, said master mix maycomprise a polymerase, dNTP(s), MgCl2, and/or oligonucleotide primer(s).In some embodiments, said master mix may comprise dNTP(s) at aconcentration comprising from about 1 mM to about 100 mM, e.g., about 1mM, about 10 mM, or about 100 mM; MgCl2 at a concentration comprisingfrom about 1 mM to about 100 mM, e.g., about 1 mM, about 10 mM, or about100 mM; and/or a oligonucleotide primer(s) at a concentration comprisingfrom about 1 nM to about 1 mM, e.g., about 1 nM, about 1 μM, or about 1mM. Additionally or alternatively, detection of a genetic biomarker caninclude a device for amplification of a target nucleic acid, whereinsaid device may (a) comprise a biplanar configuration of parallel arraysof electrodes to effect electrowetting-mediated droplet manipulations;(b) comprise or be in contact with at least one heating element; and (c)comprise or be in contact with at least one detection zone. In anembodiment, said heating element may comprise an inductive heatingelement. Another aspect generally pertains to a device for amplificationof a target nucleic acid, wherein said device may (a) comprise a planarconfiguration of electrodes to effect electrowetting-mediated dropletmanipulations; (b) comprise or be in contact with at least one heatingelement; and (c) comprise or be in contact with at least one detectionzone. In an embodiment, said heating element may comprise an inductiveheating element. In some embodiments, said electrodes may comprisesquare shapes, optionally about 5 mm by 5 mm. In some embodiments, saidelectrodes may comprise square, triangular, rectangular, circular,trapezoidal, and/or irregularly shapes. In some embodiments, saidelectrodes may comprise electrode dimensions ranging from about 100μιτιby 100μιτι to about 10 cm by 10 cm. In some embodiments, said electrodesmay be interdigitated. In some embodiments, said electrodes may compriseindium tin oxide (“ITO”), transparent conductive oxides (“TCOs”),conductive polymers, carbon nanotubes (“CNT”), graphene, nanowire meshesand/or ultra thin metal films, e.g., ITO. In some embodiments, saiddevice may comprise droplets that range in volume from about 1 picoliterto about 5 mL, e.g., about 12.5 μ{acute over (ï)}. In some embodiments,said device may comprise a gap between a top plate and a bottom plate ofabout 0.5 mm. In some embodiments, said device may comprise a pluralityof inlet/outlet ports. In some embodiments, said device may comprisebetween 1 to about 400 inlet/outlet ports for loading and removal of thesame sample or of different samples, and/or said device furthercomprises between 1 to about 100 inlet/out ports for the introductionand removal of filler fluid(s). In some embodiments, said device maycomprise inlet/outlet ports wherein the spacing between adjacent portsranges from about 5 mm to about 500 mm. In a further embodiment, saidheating element may comprise a contact heater. An additional aspectrelates to an embodiment wherein said amplification comprises thermocycling. Said thermocycling may comprise three thermocycle steps, and saidthree thermocycle steps may be completed in one minute or less. Inanother embodiment, said amplification may comprise isothermalamplification. In a further embodiment, said amplification may comprisehot start PCR. In yet another embodiment, the detection zone may detectelectrochemical and/or fluorescent signals. An additional embodimentpertains to a detection zone that may detect capacitance of a droplet.In a further embodiment, said detection zone may be a fixed location. Inanother embodiment, said detection zone may comprise any location withinthe electrowetting-based device. In another embodiment, the targetnucleic acid may be provided on the device within at least threedroplets. In a further embodiment, said droplets may each comprise thesame target nucleic acid. In yet another embodiment, said droplets maycomprise a mixture of droplets that contain the same target nucleic acidand different target nucleic acids. In another embodiment, each dropletmay further comprise a detection agent. In an additional embodiment,each droplet may contain the same detection agent. In anotherembodiment, each droplet may contain a different detection agent.Additionally, in another embodiment, the droplets may comprise a labeledsubset of droplets that each contain an agent for detecting the targetnucleic acid, and an unlabeled subset of droplets that each do notcontain said agent for detecting the target nucleic acid. In anadditional embodiment, each droplet within the subset containing adetection agent may comprise a different detection agent. In anotherembodiment, each droplet within the subset containing a detection agentmay comprise the same detection agent. In a further embodiment, eachsubset of droplets may comprise 1 or more, 2 or more, 10 or more, 100 ormore, 1,000 or more, or 10,000 or more droplets. In some embodiments,the device may detect a single nucleotide polymorphism. In yet anotherembodiment, the device may effect amplicon generation. In an additionalembodiment, the device may effect a melting curve analysis. In yetanother embodiment, the device may effect target nucleic acidenrichment. In yet another embodiment, the device may effect PETE. In anadditional embodiment, the device may effect library amplification. In afurther embodiment, the device may quantitate the number ofadapter-ligated target nucleic acid molecules during librarypreparation. For example, said quantitation may occur (a) after adapterligation to determine the amount of input material converted toadapter-ligated molecules (conversion rate) and/or the quantity oftemplate used for library amplification; (b) after libraryamplification, to determine whether a sufficient amount of each libraryhas been generated and/or to ensure equal representation of indexedlibraries pooled for target capture or cluster amplification; and/or (c)prior to cluster amplification, to confirm that individual libraries orsample pools are diluted to the optimal concentration for NGS flow cellloading. Also, said quantitation may occur after post-ligation cleanupsteps (prior to library amplification). In yet another embodiment, aftera desired amount of the target nucleic acid has been obtained, at leastone droplet may be recovered from said device prior to further analysisor processing of said droplet. For example, said further analyzing orprocessing of said at least one droplet may comprise a nucleic acidsequencing reaction, a next generation sequencing reaction, whole-genomeshotgun sequencing, whole exome or targeted sequencing, ampliconsequencing, mate pair sequencing, RIP-seq/CLIP-seq, ChlP-seq, RNA-seq,transcriptome analysis, and/or methyl-seq. Additionally oralternatively, detection of a genetic biomarker can include a system forautomated amplification of a target nucleic acid which may comprise: (a)an electrowetting-based device; (b) at least one heating element thatcomprises or is in contact with the electrowetting-based device; (c) atleast one detection zone that comprises or is in contact with theelectrowetting-based device. In an embodiment, said heating element maycomprise an inductive heating element. In a further embodiment, saidheating element may comprise a contact heater. An additional aspectrelates to an embodiment wherein said amplification comprisesthermocycling. Said thermocycling may comprise three thermocycle steps,and said three thermocycle steps may be completed in one minute or less.In another embodiment, said amplification may comprise isothermalamplification. In a further embodiment, said amplification may comprisehot start PCR. In yet another embodiment, the detection zone may detectelectrochemical and/or fluorescent signals. An additional embodimentpertains to a detection zone that may detect capacitance of a droplet.In a further embodiment, said detection zone may be a fixed location. Inanother embodiment, said detection zone may comprise any location withinthe system. In another embodiment, the target nucleic acid may beprovided on the system within at least three droplets. In a furtherembodiment, said droplets may each comprise the same target nucleicacid. In yet another embodiment, said droplets may comprise a mixture ofdroplets that contain the same target nucleic acid and different targetnucleic acids. In another embodiment, each droplet may further comprisea detection agent. In an additional embodiment, each droplet may containthe same detection agent. In another embodiment, each droplet maycontain a different detection agent. Additionally, in anotherembodiment, the droplets may comprise a labeled subset of droplets thateach contain an agent for detecting the target nucleic acid, and anunlabeled subset of droplets that each do not contain said agent fordetecting the target nucleic acid. In an additional embodiment, eachdroplet within the subset containing a detection agent may comprise adifferent detection agent. In another embodiment, each droplet withinthe subset containing a detection agent may comprise the same detectionagent. In a further embodiment, each subset of droplets may comprise 1or more, 2 or more, 10 or more, 100 or more, 1,000 or more, or 10,000 ormore droplets. Additionally or alternatively, detection of a geneticbiomarker can include an automated amplification method which maycomprise (a) providing an electrowetting-based device with a biplanarconfiguration of parallel arrays of electrodes to effectelectrowetting-mediated droplet manipulations, and further wherein saiddevice contains at least one inductive heating element and at least onedetection zone; (b) providing on said device droplets comprising atarget nucleic acid, wherein said droplets comprise a subset of dropletsthat contains an agent for target nucleic acid detection and a subset ofdroplets that does not contain said agent for target nucleic aciddetection; (c) amplifying the target nucleic acid in each said dropletin parallel; (d) quantitating the amplified target nucleic acid in saidsubset of droplets containing said agent through detection of saidagent; and (e) after a desired amount of said target nucleic acid hasbeen obtained in said subset of droplets containing an agent, recoveringat least one droplet from said subset of droplets not containing anagent for further analyzing or processing.

In some embodiments, detection of a genetic biomarker (e.g., one or moregenetic biomarkers) can include any of the variety of methods describedin P.C.T. Publication No. WO 2017/123316, which is hereby incorporatedby reference in its entirety. For example, detection of a geneticbiomarker can include a targeted sequencing workflow where an inputsample comprising a sufficient quantity of genomic material is providedsuch that minimal or no amplification processes are required prior tosequencing. In some embodiments, the input sample is derived from anintact tumor or from lymph nodes. In some embodiments, the input sampleis obtained through homogenization of an intact tumor sample (whole orpartial) and/or one or more lymph nodes obtained from a patient ormammalian subject. In some embodiments, the input sample is derived froma sufficient quantity of blood, including whole blood or any fractionthereof. In some embodiments, the input sample is derived from canceroustissue. In some embodiments, the input sample is derived fromprecancerous tissue. In some embodiments, the targeted sequencingworkflow comprises one or more amplification steps (e.g. a pre-captureamplification step, an amplification step post-capture) prior tosequencing, where each amplification step prior to sequencing comprisesfrom 0 to 3 amplification cycles, and wherein an aggregate ofamplification cycles prior to sequencing does not exceed 4. In otherembodiments, the targeted sequencing workflow comprises one or moreamplification steps (e.g. a pre-capture amplification step, anamplification step post-capture) prior to sequencing, where eachamplification step prior to sequencing comprises from 0 to 2amplification cycles, and wherein an aggregate of amplification cyclesprior to sequencing does not exceed 3. In yet other embodiments, thetargeted sequencing workflow comprises one amplification step prior tosequencing (e.g. either a pre-capture amplification step or anamplification step post-capture), where the single amplification stepprior to sequencing comprises from 0 to 3 amplification cycles. Infurther embodiments, the targeted sequencing workflow comprises oneamplification step prior to sequencing, where the single amplificationstep prior to sequencing comprises from 1 to 3 cycles. In yet furtherembodiments, the targeted sequencing workflow comprises oneamplification step prior to sequencing, where the single amplificationstep prior to sequencing comprises 1 cycle. In even further embodiments,the targeted sequencing workflow comprises one amplification step priorto sequencing, where the single amplification step prior to sequencingcomprises 2 cycles. In some embodiments, either or both of thepre-capture amplification step or the amplification step post-capturebut prior to sequencing utilizes LM-PCR. Additionally or alternatively,detection of a genetic biomarker can include a method of sequencinggenomic material within a sample comprising: homogenizing a tumor sampleand/or lymph node sample to provide a homogenized sample; isolating atleast 0.5 micrograms of genomic material from the homogenized sample;preparing the at least 0.5 micrograms of isolated genomic material forsequencing; and sequencing the prepared genomic material. In someembodiments, the method does not comprise any amplification steps priorto sequencing. In some embodiments, the method comprises at least onepre-capture or post-capture amplification step, wherein an aggregatenumber of amplification cycles conducted during the at least onepre-capture or post-capture amplification step is at most 4 cycles. Insome embodiments, the aggregate number of amplification cycles is 3. Insome embodiments, the aggregate number of amplification cycles is 2. Insome embodiments, the preparing of the at least 0.5 micrograms ofisolated genomic material for sequencing comprises hybridizing the atleast 0.5 micrograms of isolated genomic to capture probes and capturingthe hybridized genomic material. In some embodiments, an amount ofcaptured genomic material ranges from about 90 ng to about 900 ng. Insome embodiments, 1 or 2 amplification cycles are performed on thecaptured genomic material. In some embodiments, the homogenized samplecomprises a representative sampling of cells. In some embodiments, atleast 1 microgram of genomic material is isolated from the homogenizedsamples. In some embodiments, at least 5 micrograms of genomic materialis isolated from the homogenized samples. In some embodiments, at least10 micrograms of genomic material is isolated from the homogenizedsamples. Additionally or alternatively, detection of a genetic biomarkercan include a method of sequencing DNA within a sample comprisingisolating at least 0.5 micrograms of DNA from a blood sample; preparingthe at least 0.5 micrograms of isolated DNA for sequencing, andsequencing the prepared DNA. In some embodiments, the method comprises 0amplification steps prior to sequencing. In some embodiments, thepreparing of the at least 0.5 micrograms of isolated DNA for sequencingcomprises hybridizing the at least 0.5 micrograms of isolated genomic tocapture probes and capturing the hybridized genomic material. In someembodiments, an amount of captured genomic material ranges from about 90ng to about 900 ng. In some embodiments, 1 or 2 amplification cycles areperformed on the captured genomic material. In some embodiments, atleast 1 microgram of DNA is isolated from the blood sample. Additionallyor alternatively, detection of a genetic biomarker can include a methodof targeted representational sequencing comprising: (i) homogenizing atleast a portion of a tumor, one or more whole or partial lymph nodes, orany combination thereof to provide a homogenized sample; (ii) extractinggenomic material from the homogenized sample; (iii) capturing theextracted genomic material onto beads; and (iv) sequencing the capturedgenomic material; wherein the targeted representational sequencingcomprises performing at most 4 amplification cycles prior to sequencingof the captured genomic material. In some embodiments, the at most 3amplification cycles may be conducted prior to capture of the extractedgenomic material or after capture of the extracted genomic material, orany combination thereof. In some embodiments, no pre-captureamplification cycles are conducted. In some embodiments, an amount ofcaptured genomic material ranges from about 90 ng to about 900 ng. Insome embodiments, from 1 to 3 amplification cycles are performedfollowing capture of the extracted genomic material, but prior tosequencing. In some embodiments, at least 0.5 micrograms of genomicmaterial is extracted from the homogenized sample. In some embodiments,at least 100 times more genomic material is derived from the homogenizedsample as compared with an amount of input material used in a sequencingmethod requiring more than 4 amplification cycles. Additionally oralternatively, detection of a genetic biomarker can include a method ofsequencing DNA within a sample comprising: providing at least 0.5micrograms of input genomic material, the at least 0.5 micrograms ofgenomic material derived from a tumor sample, a lymph node sample, or ablood sample, isolating DNA from the input genomic sample, preparing theisolated DNA for sequencing, and sequencing the prepared DNA, whereinthe method does not comprise any amplification steps. In someembodiments, the at least 0.5 micrograms of input genomic material isderived from multiple histological and/or biopsy specimens. In someembodiments, the at least 0.5 micrograms of input genomic material isderived from a homogenized tumor sample. In some embodiments, the atleast 0.5 micrograms of input genomic material is derived from ahomogenized lymph node sample. In some embodiments, the at least 0.5micrograms of input genomic material is a representative sampling of thetumor sample, lymph node sample, or blood sample from which it isderived. In some embodiments, the sequencing is performed using anext-generation sequencing method. In some embodiments, sequencing isperformed using a synthesis sequencing methodology. Additionally oralternatively, detection of a genetic biomarker can include a method ofreducing PCR-introduced mutations during sequencing comprising isolatingDNA from a sample comprising a sufficient amount of genomic material;preparing the isolated DNA for sequencing; and sequencing the preparedDNA, wherein the method comprises at most 3 amplification cycles priorto sequencing. In some embodiments, the method comprises 1 or 2amplification cycles prior to sequencing. In some embodiments,sufficient amount of input genomic material is an amount such that nopre-capture amplification cycles are utilized. In some embodiments, thesample is derived from a patient suspected of having cancer. In someembodiments, the sample is derived from a patient diagnosed with cancer.In some embodiments, the sample is derived from a patient at risk ofdeveloping cancer. In some embodiments, the sample is derived fromhealthy tissue samples. In some embodiments, 0.5 micrograms of DNA isisolated from the sample. In some embodiments, at least 1 microgram ofgenomic material is isolated from the sample. In some embodiments, atleast 5 micrograms of genomic material is isolated from the sample. Insome embodiments, at least 10 micrograms of genomic material is isolatedfrom the sample. Additionally or alternatively, detection of a geneticbiomarker can include a sequencing method where PCR-introduced mutationsare reduced, the sequencing method comprising capturing at least 0.05micrograms of genomic material, and performing between 0 and 2amplification cycles prior to sequencing. In some embodiments, 0amplification cycles are conducted. In other embodiments, 1amplification cycle is conducted. In yet other embodiments, 2amplification cycles are conducted. Additionally or alternatively,detection of a genetic biomarker can include a sequence capture methodwhere PCR-introduced biases in the proportional representation of genomecontent are reduced, the sequencing method comprising providing an inputsample comprising at least 0.5 micrograms of genomic material, and wherethe sequence capture method comprises performing between 0 and 2amplification cycles prior to sequencing. In some embodiments, 0amplification cycles are conducted. In other embodiments, 1amplification cycle is conducted. In yet other embodiments, 2amplification cycles are conducted. In some embodiments, the inputsample comprises at least 1 microgram of genomic material. In someembodiments, the input sample comprises at least 5 micrograms of genomicmaterial. In some embodiments, the input sample comprises at least 10micrograms of genomic material. Additionally or alternatively, detectionof a genetic biomarker can include a sequence capture method wherePCR-introduced mutations are eliminated, the sequence capture methodcomprising preparing an input sample comprising at least 0.5 microgramsof genomic material. In some embodiments, the input sample comprises atleast 1 microgram of genomic material. In some embodiments, the inputsample comprises at least 5 micrograms of genomic material. In someembodiments, the input sample comprises at least 10 micrograms ofgenomic material. Additionally or alternatively, detection of a geneticbiomarker can include a sequence capture method where a step of removingPCR-duplicate reads prior to sequencing is eliminated, the sequencecapture method comprising providing an input sample comprising at least0.5 micrograms of genomic material. In some embodiments, the inputsample comprises at least 1 microgram of genomic material. In someembodiments, the input sample comprises at least 5 micrograms of genomicmaterial. In some embodiments, the input sample comprises at least 10micrograms of genomic material. Additionally or alternatively, detectionof a genetic biomarker can include a sequencing method wherePCR-introduced mutations are virtually eliminated, the sequencing methodcomprising capturing at least 0.05 micrograms of genomic material. Insome embodiments, about 0.05 micrograms of genomic material are providedafter capture of the genomic material. In some embodiments, 1 or 2post-capture amplification cycles are performed prior to sequencing.

Examples of genetic biomarkers that can be detected using any of thevariety of techniques described herein include, without limitation,ABCA7, ABL1, ABL2, ACVR1B, ACVR2A, AJUBA, AKT1, AKT2, ALB, ALDOB, ALK,AMBRA1, AMER1, AMOT, ANKRD46, APC, AR, ARHGAP35, ARHGEF12, ARID1A,ARID1B, ARID2, ARID4B, ARL15, ARMCX1, ASXL1, ASXL2, ATAD2, ATF1, ATG14,ATG5, ATM, ATRX, ATXN2, AXIN1, B2M, BAP1, BCL11A, BCL11B, BCL2, BCL3,BCL6, BCL9, BCLAF1, BCOR, BCR, BIRC6, BIRC8, BLM, BLVRA, BMPR1A, BRAF,BRCA1, BRCA2, BRD7, BRE, BRWD3, BTBD7, BTRC, C11orf70, C12orf57, C2CD5,C3orf62, C8orf34, CAMKV, CAPG, CARD11, CARS, CASP8, CBFA2T3, CBFB, CBLC,CBX4, CCAR1, CCDC117, CCDC88A, CCM2, CCNC, CCND1, CCND2, CCND3, CCR3,CD1D, CD79B, CDC73, CDCP1, CDH1, CDH11, CDK12, CDK4, CDK6, CDKN1A,CDKN1B, CDKN2A, CDX2, CEBPA, CELF1, CENPB, CEP128, CHD2, CHD4, CHD8,CHEK2, CHRDL1, CHUK, CIC, CLEC4C, CMTR2, CNN2, CNOT1, CNOT4, COL11A1,COPS4, COX7B2, CREB1, CREBBP, CSDE1, CSMD3, CTCF, CTDNEP1, CTNNB1, CUL1,CUL2, CYB5B, CPLD, DACH1, DCHS1, DCUN1D1, DDB2, DDIT3, DDX3X, DDX5,DDX6, DEK, DHX15, DHX16, DICER1, DIRC2, DIS3, DIXDC1, DKK2, DNAJB5,DNER, DNM1L, DNMT3A, EED, EGFR, EIF1AX, EIF2AK3, EIF2S2, EIF4A1, EIF4A2,ELF3, ELK4, EMG1, EMR3, EP300, EPB41L4A, EPHA2, EPS8, ERBB2, ERBB3,ERRFI1, ETV4, ETV6, EVI1, EWSR1, EXO5, EXT1, EXT2, EZH2, F5, FANCM,FAT1, FBN2, FBXW7, FCER1G, FEV, FGF2, FGFR1, FGFR1OP, FGFR2, FGFR3, FH,FLT3, FN1, FOXA1, FOXP1, FUBP1, FUS, GALNTL5, GATA3, GGCT, GIGYF2, GK2,GLIPR2, GNAS, GNPTAB, GNRHR, GOLGA5, GOLM1, GOPC, GOT2, GPC3, GPS2,GPX7, GRK1, GSE1, GZMA, HDAC1, HERC1, HERC4, HGF, HIST1H2BO, HLA-A,HLA-B, HMCN1, HMGA1, HMGA2, HNRNPA1, HRAS, HSP90AB1, ID3, IDH1, IDH2,IFNGR2, IFT88, IKZF2, IL2, INO80C, INPP4A, INPPL1, IRF4, IWS1, JAK1,JAK2, JUN, KANSL1, KATE, KATNAL1, KBTBD7, KCNMB4, KDM5C, KDM6A, KEAP1,KIAA1467, KIT, KLF4, KMT2A, KMT2B, KMT2C, KMT2D, KMT2E, KRAS, KRT15,LAMTOR1, LARP4B, LCK, LMO2, LPAR2, LYN, MAF, MAFB, MAML2, MAP2K1,MAP2K2, MAP2K4, MAP3K1, MAP4K3, MAPK1, MAX, MB21D2, MBD1, MBD6, MBNL1,MBNL3, MDM2, MDM4, MED12, MED23, MEN1, MET, MGA, MITF, MKLN1, MLH1, MLL,MLLT4, MOAP1, MORC4, MPL, MS4A1, MSH2, MSI1, MTOR, MYB, MYC, MYCL1,MYCN, MYD88, MYL6, MYO1B, MYO6, NAA15, NAA25, NAP1L2, NAP1L4, NCOA2,NCOA4, NCOR1, NEK9, NF1, NF2, NFE2L2, NFE2L3, NFKB2, NIPBL, NIT1,NKX3-1, NME4, NOTCH1, NOTCH2, NPM1, NR4A3, NRAS, NSD1, NTRK1, NUP214,NUP98, PALB2, PAX8, PBRM1, PCBP1, PCOLCE2, PDGFB, PHF6, PIK3CA, PIK3CB,PIK3R1, PIM1, PLAG1, PML, POLA2, POT1, PPARD, PPARG, PPM1D, PPP2R1A,PPP6C, PRKACA, PRKCI, PRPF40A, PSIP1, PTEN, PTH2, PTMS, PTN, PTPN11,RAB18, RAC1, RAF1, RANBP3L, RAPGEF6, RASA1, RB1, RBBP6, RBM10, RBM26,RC3H2, REL, RERE, RET, RFC4, RHEB, RHOA, RIMS2, RIT1, RNF111, RNF43,ROS1, RPL11, RPL5, RQCD1, RRAS2, RUNX1, RXRA, SARM1, SCAF11, SDHB, SDHD,SEC22A, SENP3, SENP8, SETD1B, SETD2, SF3A3, SF3B1, SFPQ, SIN3A, SKAP2,SMAD2, SMAD3, SMAD4, SMARCA4, SMARCB1, SMARCC2, SMO, SNCB, SOCS1, SOS1,SOX4, SOX9, SP3, SPEN, SPOP, SPSB2, SS18, STAG2, STK11, STK31, SUFU,SUFU, SUZ12, SYK, TAF1A, TARDBP, TAS2R30, TBL1XR1, TBX3, TCF12, TCF3,TCF7L2, TCL1A, TET2, TEX11, TFDP2, TFG, TGFBR2, THRAP3, TLX1, TM9SF1,TMCO2, TMED10, TMEM107, TMEM30A, TMPO, TNFAIP3, TNFRSF9, TNRC6B, TP53,TP53BP1, TPR, TRAF3, TRIMS, TRIP12, TSC1, TSC2, TTK, TTR, TUBA3C, U2AF1,UBE2D3, UBR5, UNC13C, UNKL, UPP1, USO1, USP28, USP6, USP9X, VHL, VN1R2,VPS33B, WAC, WDR33, WDR47, WRN, WT1, WWP1, XPO1, YOD1, ZC3H13, ZDHHC4,ZFHX3, ZFP36L1, ZFP36L2, ZGRF1, ZMYM3, ZMYM4, ZNF234, ZNF268, ZNF292,ZNF318, ZNF345, ZNF600, ZNF750, and/or ZNF800.

As used herein, “TP53” refers to the gene and/or the protein encoded bythe gene, which is tumor suppressor protein p53 involved in theregulation of cell proliferation. TP53 gene plays a crucial role inpreventing cancer formation. TP53 gene encodes proteins that bind to DNAand regulate gene expression to prevent mutations of the genome. HumanTP53 sequences are known in the art (e.g., GenBank accession numbersNM_001276761, NM_000546, NM_001126112, NM_001126113, and NM_001126114).One of ordinary skill in the art can identify additional TP53 sequencesand variants thereof.

As used herein, “PIK3CA” refers to the gene and/or the protein encodedby the gene, which is the catalytic subunit of Phosphoinositol-3 kinase(PI3K), isoform alpha, also referred to as p110alpha. PIK3CA has beenfound to be oncogenic and has been implicated in a variety of cancers.Human PIK3CA sequences are known in the art (e.g., GenBank accessionnumber NM_006218). One of ordinary skill in the art can identifyadditional PIK3CA sequences and variants thereof.

As used herein, the term “FGFR3” refers to the gene and/or the proteinencoded by the gene, which is fibroblast growth factor receptor 3(FGFR3) which belongs to a family of structurally related tyrosinekinase receptors (FGFRs 1-4) encoded by four different genes. Theextracellular portion of the protein interacts with fibroblast growthfactors, setting in motion a cascade of downstream signals whichultimately influencing cell mitogenesis and differentiation. Human FGFR3sequences are known in the art (e.g., GenBank accession numbersNM_000142, NM_001163213, NM_022965, NM_001354809, and NM_001354810). Oneof ordinary skill in the art can identify additional FGFR3 sequences andvariants thereof.

As used herein, the term “KRAS” refers to the gene and/or the proteinencoded by the gene known as K-ras or Ki-ras, which is proto-oncogenecorresponding to the oncogene first identified in Kirsten rat sarcomavirus and the gene product was first found as a p21 GTPase. Human KRASsequences are known in the art (e.g., GenBank accession numbersNM_004985 and NM_033360). One of ordinary skill in the art can identifyadditional KRAS sequences and variants thereof.

As used herein, the term “ErbB2” refers to the gene and/or the proteinencoded by the gene, which is also known as v-erb-b2 avianerythroblastic leukemia viral oncogene homolog 2, c-erbB2/neu, her2/neu,or Her2. ErbB2 is a member of the epidermal growth factor receptorfamily of tyrosine kinases. It is amplified and/or overexpressed inseveral cancers, including breast and ovarian cancer. Human ErbB2sequences are known in the art (e.g., GenBank accession numbersNM_001005862, NM_001289936, NM_001289937, NM_001289938, and NM_004448).One of ordinary skill in the art can identify additional ErbB2 sequencesand variants thereof.

As used herein, “CDKN2A” refers to the gene and/or the protein encodedby the gene, which is known as cyclin-dependent kinase Inhibitor 2A, actas tumor suppressors by regulating the cell cycle. Human CDKN2Asequences are known in the art (e.g., GenBank accession numbersNM_000077, NM_001195132, NM_058195, NM_058196, and NM_058197). One ofordinary skill in the art can identify additional CDKN2A sequences andvariants thereof.

As used herein, the term “MLL” refers to the gene and/or the proteinencoded by the gene, which is lymphoid or mixed-lineage leukemia 2. MLLis a major mammalian histone H3 lysine 4 (H3K4) mono-methyltransferase.MLL protein co-localizes with lineage determining transcription factorson transcriptional enhancers and is essential for cell differentiationand embryonic development. MLL also plays critical roles in regulatingcell fate transition, metabolism, and tumor suppression. Mutations inMLL have been associated with Kabuki Syndrome, congenital heart disease,and various forms of cancer. Human MLL sequences are known in the art(e.g., GenBank accession number NM_003482). One of ordinary skill in theart can identify additional MLL sequences and variants thereof.

As used herein, the term “HRAS” refers to the gene and/or the proteinencoded by the gene, which is harvey rat sarcoma viral oncogene homolog,is a small G protein, activating the MAP kinase pathway. HRAS isinvolved in regulating cell division in response to growth factorstimulation. HRAS has been shown to be a proto-oncogene. When mutated,proto-oncogenes have the potential to cause normal cells to becomecancerous. Human HRAS sequences are known in the art (e.g., GenBankaccession numbers NM_001130442, NM_005343, NM_176795, and NM_001318054).One of ordinary skill in the art can identify additional HRAS sequencesand variants thereof.

As used herein, the term “MET” refers to the gene and/or the proteinencoded by the gene. MET gene encodes c-Met, also calledtyrosine-protein kinase Met or hepatocyte growth factor receptor (HGFR).MET is a single pass tyrosine kinase receptor essential for embryonicdevelopment, organogenesis and wound healing. Hepatocyte growthfactor/Scatter Factor (HGF/SF) and its splicing isoform (NK1, NK2) arethe only known ligands of the MET receptor. MET is normally expressed bycells of epithelial origin, while expression of HGF/SF is restricted tocells of mesenchymal origin. When HGF/SF binds its cognate receptor METit induces its dimerization through a not yet completely understoodmechanism leading to its activation. Human MET sequences are known inthe art (e.g., GenBank accession numbers NM_000245, NM_001127500,NM_001324401, and NM_001324402). One of ordinary skill in the art canidentify additional MET sequences and variants thereof.

As used herein, the term “VHL” refers to the gene and/or the proteinencoded by the gene. VHL gene is Von Hippel Lindau tumor suppressorgene. A germline mutation of the VHL gene is the basis of familialinheritance of Von Hippel-Lindau syndrome, a dominantly inheritedhereditary cancer syndrome predisposing to a variety of malignant andbenign tumors of the eye, brain, spinal cord, kidney, pancreas, andadrenal glands. Human VHL sequences are known in the art (e.g., GenBankaccession numbers NM_000551, NM_198156, and NM_001354723). One ofordinary skill in the art can identify additional VHL sequences andvariants thereof.

Scoring for Genetic Mutations

The present disclosure provides methods of identifying the presence ofcancer in a subject with high sensitivity and specificity based at leastin part on the presence of one or more genetic biomarkers. Variousmethods are provided herein to determine whether the subject has cancerand/or the likelihood that the subject has cancer. In some embodiments,these methods involve various types of statistical techniques andmethods, including, e.g., scoring methods, regression analysis,clustering, principal component analysis, nearest neighbor classifieranalysis (e.g., k-nearest neighbors algorithm), linear discriminantanalysis, neural networks, and support vector machines, etc.

In some embodiments, one or more genetic biomarkers can be used togenerate a score. The score can indicate that the likelihood that thesubject has a cancer or does not have a cancer. In some embodiments, thelikelihood is generated by comparing the mutation allele frequency ofeach mutation in one or more genetic biomarkers to a referencedistribution of mutation allele frequency. As described herein, genomicsegments containing one or more biomarkers can be amplified by a set ofprimers. The set of primers can have one or more pairs of primers thatamplify one or more non-overlapping genomic segments. The same set ofprimers can be used to amplify template DNA collected from a subject inone or more wells (e.g., equal to, or more than 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12, 20, or 30 wells), thus the amplification process canprovide duplicate signals for mutants (e.g., rare mutations) that aredetectable in multiple wells. In some embodiments, the PCR products canbe subject to one or more rounds of additional amplification beforesequencing. In some embodiments, reads from a common template moleculecan be then grouped, e.g., based on the unique identifier sequences(UIDs) that are incorporated as molecular barcodes. In some embodiments,artifactual mutations that are introduced during the sample preparationor sequencing steps can be reduced by requiring a mutation to be presentin e.g., greater than 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,98%, or 99% of reads in each UID family. In some embodiments, redundantreads arising from optical duplication can be eliminated by requiringreads with the same UID and sample index to be at least 1000, 2000,3000, 4000, 5000, 6000, 7000, 8000, or 9000 pixels apart when located onthe same tile.

In some embodiments, mutations that meet one or two of the two followingcriteria are considered (i) present in the Catalogue of SomaticMutations in Cancer (COSMIC) database, or (ii) predicted to beinactivating in tumor suppressor genes (nonsense mutations, out-of-frameinsertions or deletions, canonical splice site mutations). In someembodiments, synonymous mutations, except those at exon ends, andintronic mutations, except for those at splice sites, are excluded.These selected mutations are referred as supermutants. Thus, in someembodiments, supermutants include e.g., mutations present in theCatalogue of Somatic Mutations in Cancer (COSMIC) database, mutationspredicted to be inactivating in tumor suppressor genes (nonsensemutations, out-of-frame insertions or deletions, canonical splice sitemutations), non-synonymous mutations, mutations that can affectsplicing, and/or mutations that can affect expression, etc.

Thus, as used herein, the term “mutant allele frequency” or “MAF” withina sample (e.g., a well, a test sample) refers to the proportion of UIDsin the sample that have such a mutation. The MAF reflects the mutantfraction within each sample (e.g., each well) and represents anindependent sampling of the mutant allele frequency in the sample ofinterest. In some embodiments, the MAF of a mutation in a sample (ratherthan the well) can be calculated by the total number of mutants presentin all wells for the sample (e.g., a sample collected from a subject)divided by the total number of UIDs.

In some embodiments, MAF normalization is performed. In someembodiments, all mutations that do not have at least one supermutant inat least one well are excluded from the analysis. For example, themutant allele frequency (MAF) can reflect the ratio between the totalnumber of supermutants in each well from that sample and the totalnumber of UIDs in the same well from that sample. In some embodiments,the MAF is first normalized based on the observed MAFs for each mutationin a set of normal controls comprising the normal plasmas in thetraining set. In some embodiments, mutations with <100 UIDs areexcluded. The normalization can be performed by standard normalization(i.e. subtracting the mean and dividing by the standard deviation) ormultiplying the MAF with a predetermined ratio. In some embodiments, thenormalization is performed by first calculating the average MAF (ave_i)for each mutation i=1, n, found among the normal controls. Using the25th percentile of the distribution generated by these averages as thereference value (ave_ref), each MAF can be normalized multiplying it bythe ratio ave_ref/ave_i. For example, if the observed average MAF of amutation in a set of controls is 10 times higher than ave_ref, then eachMAF for that mutation can be multiplied by 1/10.

The classification of a sample's genetic biomarker status can beobtained, e.g., from a statistical test by comparing the MAF of one ormore mutations in the selected genetic biomarkers to a referencedistribution of mutation allele frequency for the mutations in a groupof control samples, by comparing the mutation allele frequency of one ormore mutations in the selected genetic biomarkers to a first referencedistribution of mutation allele frequency in control samples and asecond reference distribution of mutation allele frequency in samplescollected from subjects having a cancer, or by comparing the mutationallele frequency of one or more mutations in the selected geneticbiomarkers to the maximum mutation allele frequency of the mutation incontrol samples.

The control reference distribution and the maximum mutation allelefrequency of the mutation can be determined from controls samples. Insome embodiments, the group of control samples has at least 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130,140, 150, 160, 170, 180, 190, 200, or more control subjects. In someembodiments, the control subjects are healthy subjects, or at least donot have cancer, or are not suspected to have cancer. The controlsamples collected from these control subjects can be amplified andsequenced. The mutations in one or more selected genetic biomarkers canbe determined. Thus, a MAF for a particular mutation in one subject canbe determined, and the distribution of MAF in control samples can bedetermined from the group of control subjects. Similarly, the maximummutation allele frequency of the mutation can also be determined incontrol samples.

In some embodiments, the MAF of one or more mutations in the selectedgenetic biomarkers can be compared against the reference distribution,thereby obtaining a score indicates that the likelihood or theprobability that the subject has cancer. In some embodiments, if thescore (e.g., likelihood or probability) is equal to or greater than areference threshold, it can be determined that the subject is likely tohave cancer, otherwise, it can be determined that the subject is notlikely to have cancer. In some embodiments, the comparison can provide ascore that indicates the likelihood or probability that the subject doesnot have cancer. In some embodiments, if the score (e.g., likelihood orprobability) is equal to or less than a reference threshold, thus it canbe determined that the subject is likely to have cancer, otherwise, itcan be determined that the subject is not likely to have cancer.

In some embodiments, the MAF is first normalized based on the observedMAFs in a set of normal controls for each mutation. Following thismutation-specific normalization, the MAF of each mutation in each wellis compared to a reference distribution of MAFs built from normalcontrols with all mutations included, and a p-value is calculated fromthis distribution. In some embodiments, the lowest p-value among allmutations detected in a given sample was deemed the “top mutation”. Theclassification of a sample's ctDNA status is based on whether thep-value of this top mutation was below or above a given threshold. Thethreshold can be selected based on a desired specificity observed amongan independent set of normal controls.

In some embodiments, the Stouffer's Z-score is used to combinehypothesis test results from two or more independent tests (e.g., testresults from 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12 wells). For example,the results of MAF for two or mutations can be combined into one singletest. In some embodiments, a sample is scored as positive whenStouffer's Z-score is greater than a reference threshold. In someembodiments, a sample is scored positive if the ratio of the Stouffer'sZ-score to the average of the first few (e.g., 2, 3, 4, 5, 6 7, 8, 9, or10) highest Stouffer's Z-scores in the controls is greater than areference threshold.

In some embodiments, the MAF of one or more mutations in the selectedgenetic biomarkers is compared to maximum mutation allele frequency ofthe mutation in control samples. If one or more mutations (e.g., equalto or at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70,80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, or 200) have aMAF that is greater than the maximum mutation allele frequency of themutation in control samples, the subject can be determined to havecancer. In some embodiments, a score can be obtained from thecomparison. In some embodiments, the score is the total number ofmutations that have a MAF that is greater than the maximum mutationallele frequency of the mutation in control samples. If the score isgreater than a reference threshold, then it can be determined that thesubject is likely to have cancer. In some embodiments, the average MAFof one or more mutations in the selected genetic biomarkers iscalculated.

In some embodiments, the MAF of one or more mutations in the selectedgenetic biomarkers can be compared to a first reference distribution ofmutation allele frequency in control samples and a second referencedistribution of mutation allele frequency in samples collected fromsubjects having cancer. In some embodiments, a score is obtained bycomparing the mutation frequencies of the sample of interest to thedistributions of the mutation frequencies of, respectively, normal andcancer samples in the training set.

In some embodiments, the UID range for each mutation is split in 10intervals (e.g., <1,000, 1,000-2,000, . . . , 8,000-9,000, >9,000).Depending on the number of UIDs, the MAF of each mutation in each wellcan be compared to two reference distributions of MAFs built fromsamples in the corresponding UID range: 1) a distribution built from allthe normal control samples in the training set; and 2) a distributionbuilt from the samples from cancer patients in the training set. In someembodiments, the cancer training set includes only those in which thesame mutation is present in the sample (e.g., plasma) and in thecorresponding primary tumor, with an MAF>5% in the tumor. Correspondingp-values, pN and pC, can be obtained. The reference distributions forboth the normal and cancer samples can be built independently, from thetraining sets, in each round and each iteration of 10-foldcross-validation, i.e., 90% of the samples in each iteration are usedfor training and 10% of the samples are used for testing.

For each mutation, an omega score can be obtained. The log ratio ofthese two p values, pC/pN can then be calculated. In some embodiments,the minimum and maximum of these log ratios across the replicate wellscan be eliminated so that the results will be less sensitive tooutliers. In some embodiments, a log-likelihood ratio is used. Ascompared to the log-likelihood ratio, the log ratio of the p-values canprovide some additional advantages, because the relatively low number ofdata points available do not allow a robust estimation of the densitiesof the MAF distributions (particularly for pC). Thus, in someembodiments, an “omega” score was then determined according to thefollowing formula:

$\Omega = {\sum\limits_{i = 1}{w_{i}*\ln \frac{p_{i}^{C}}{p_{i}^{N}}}}$

where wi is the number of UIDs in well i divided by the total number ofUIDs for that mutation in the wells that are included in the analysis.In some embodiments, the total number of wells that are included in theanalysis is 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 20, 30, or more. Insome embodiments, the wells with the maximum log ratio and the minimumlog ratio can be excluded from the analysis. In some embodiments, thetotal number of wells are that are included for analysis is 1, thus, theomega score can be obtained by the following formula instead:

$\Omega = {\ln \frac{p_{i}^{C}}{p_{i}^{N}}}$

The log ratio of p-values can be weighted so that those wells containingmore template molecules would have a greater impact on the finalstatistic (the omega score). The rationale for this weighting was thatthe larger the number of template molecules in a well, the moreconfidence in the result.

In some embodiments, an Ω score for each mutation can be determined. Insome embodiments, mutations with Ω scores greater than a reference score(e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) are selected. The total numberof mutations with Ω scores greater than a reference score reflects themutation burden in a subject. In some embodiments, if the total numberof mutations with Ω scores that is greater than a reference score isgreater than a reference threshold, it is determined that the subject islikely to have cancer; otherwise, it can be determined that the subjectis not likely to have a cancer.

In some embodiments, the mutation with the greatest Ω score is deemedthe “top mutation”. The Ω score for top mutation can be used todetermine whether a subject is likely to have cancer. For example, the Ωscore for the top mutation can be compared against a referencethreshold, or be combined with some other information (e.g., proteinbiomarkers) in various methods (e.g., regression analysis) to determinethe likelihood that a subject has cancer. In some embodiments, if the Ωscore for the top mutation is greater than a reference threshold, it isdetermined that the subject is likely to have cancer; otherwise, it canbe determined that the subject is not likely to have a cancer. In someembodiments, the Ω score is used in a regression analysis (e.g.,logistic regression).

Detecting Protein Biomarkers

In some embodiments, the presence of a protein biomarker may be detectedin any of a variety of biological samples isolated or obtained from asubject (e.g., a human subject) including, but not limited to blood,plasma, serum, urine, cerebrospinal fluid, saliva, sputum,broncho-alveolar lavage, bile, lymphatic fluid, cyst fluid, stool,ascites, and combinations thereof. Any protein biomarker known in theart may be detected when a threshold value is obtained above whichnormal, healthy human subjects do not fall, but human subjects withcancer do fall.

Any appropriate method can be used to detect the level of one or moreprotein biomarkers as described herein. In some embodiments, the levelof one or more protein biomarkers is compared to a predeterminedthreshold. In some embodiments, the predetermined threshold is a generalor global threshold. In other cases, the predetermined threshold is athreshold that is relevant to a particular protein biomarker. In someembodiments, the level of the one or more protein biomarkers is comparedto an absolute amount of a reference protein biomarker. In someembodiments, the level of the one or more protein biomarkers is relativeto an amount of a reference protein biomarker. In some embodiments, thelevel of the one or more protein biomarkers is an elevated level. Insome embodiments, the level of the one or more protein biomarkers isabove a predetermined threshold. In other cases, the level of the one ormore protein biomarkers is within a predetermined threshold range. Insome embodiments, the level of the one or more protein biomarkers is orapproximates a predetermined threshold. In some embodiments, the levelof the one or more protein biomarkers is below a predeterminedthreshold. In some embodiments, the level of the one or more proteinbiomarkers from a biological sample is lower than a particularthreshold. In some embodiments, the level of the one or more proteinbiomarkers from a biological sample is depressed compared to apredetermined threshold.

In some embodiments, methods provided herein for selecting a subject forfurther diagnostic testing and/or increased monitoring include detectinga protein biomarker in the biological sample and comparing the amount ofprotein biomarker in the biological sample to a reference level in areference sample. In some embodiments, methods for selecting a subjectfor further diagnostic testing and/or increased monitoring includedetecting a protein biomarker in the biological sample and comparing theamount of protein biomarker in the biological sample to a referencelevel, wherein the reference level is a composite number derived frommultiple reference samples. In some embodiments, the protein biomarkerin the biological sample is at least 5% higher than a reference level.In some embodiments, the protein biomarker in the biological sample isat least 10% higher than a reference level. In some embodiments, theprotein biomarker in the biological sample is at least 15% higher than areference level. In some embodiments, the protein biomarker in thebiological sample is at least 20% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is at least25% higher than a reference level. In some embodiments, the proteinbiomarker in the biological sample is at least 30% higher than areference level. In some embodiments, the protein biomarker in thebiological sample is at least 40% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is at least50% higher than a reference level. In some embodiments, the proteinbiomarker in the biological sample is at least 60% higher than areference level. In some embodiments, the protein biomarker in thebiological sample is at least 70% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is at least80% higher than a reference level. In some embodiments, the proteinbiomarker in the biological sample is at least 90% higher than areference level. In some embodiments, the protein biomarker in thebiological sample is at least 100% higher than a reference level. Insome embodiments, the protein biomarker in the biological sample is atleast 200% higher than a reference level. In some embodiments, theprotein biomarker in the biological sample is at least 300% higher thana reference level. In some embodiments, the protein biomarker in thebiological sample is at least 400% higher than a reference level. Insome embodiments, the protein biomarker in the biological sample is atleast 500% higher than a reference level. In some embodiments, theprotein biomarker in the biological sample is at least 600% higher thana reference level. In some embodiments, the protein biomarker in thebiological sample is at least 700% higher than a reference level. Insome embodiments, the protein biomarker in the biological sample is atleast 800% higher than a reference level. In some embodiments, theprotein biomarker in the biological sample is at least 900% higher thana reference level. In some embodiments, the protein biomarker in thebiological sample is at least 1000% higher than a reference level. Insome embodiments, the protein biomarker in the biological sample isbetween about 5% to about 1000% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 10% to about 1000% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 15% to about 1000% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 20% to about 1000% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 25% to about 1000% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 30% to about 1000% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 10% to about 100% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 15% to about 100% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 20% to about 100% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 25% to about 100% higher than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 30% to about 100% higher than a reference level.

In some embodiments, the protein biomarker is at least 5% lower than areference level. In some embodiments, the protein biomarker is at least10% lower than a reference level. In some embodiments, the proteinbiomarker is at least 15% lower than a reference level. In someembodiments, the protein biomarker is at least 20% lower than areference level. In some embodiments, the protein biomarker is at least25% lower than a reference level. In some embodiments, the proteinbiomarker is at least 30% lower than a reference level. In someembodiments, the protein biomarker in the biological sample is at least40% lower than a reference level. In some embodiments, the proteinbiomarker in the biological sample is at least 50% lower than areference level. In some embodiments, the protein biomarker in thebiological sample is at least 60% lower than a reference level. In someembodiments, the protein biomarker in the biological sample is at least70% lower than a reference level. In some embodiments, the proteinbiomarker in the biological sample is at least 80% lower than areference level. In some embodiments, the protein biomarker in thebiological sample is at least 90% lower than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 5% to about 100% lower than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 10% to about 100% lower than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 15% to about 100% lower than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 20% to about 100% lower than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 25% to about 100% lower than a reference level. In someembodiments, the protein biomarker in the biological sample is betweenabout 30% to about 100% lower than a reference level.

In some embodiments, the protein biomarker is a cytokine biomarker. Insome embodiments, the protein biomarker is a chemokine biomarker. Insome embodiments, the protein biomarker is a growth factor biomarker. Insome embodiments, the protein biomarker is associated with inflammation.In some embodiments, the protein biomarker is associated with cancer. Insome embodiments, the protein biomarker is associated with a particulartype of cancer. Any appropriate cancer can be identified and/or treatedas described herein. In some embodiments, the cancer is a common cancer.In some embodiments, the cancer is a cancer where no blood-based test isavailable. In some embodiments, the cancer is a cancer where no test forearly detection is available. In some embodiments, the cancer is a StageI cancer. In some embodiments, the cancer is a Stage II cancer. In someembodiments, the cancer is a Stage III cancer. In some embodiments, thecancer is a Stage IV cancer. In some embodiments, the cancer is asurgically resectable cancer. In some embodiments, the cancer is asurgically unresectable cancer. Examples of cancers that be identifiedas described herein (e.g., based at least in part on the presence orabsence of one or more first biomarkers (e.g., genetic biomarkers)and/or an elevated level of one or more second biomarkers (e.g., peptidebiomarkers)) and/or the presence of aneuploidy include, withoutlimitation, liver cancer, ovarian cancer, esophageal cancer, stomachcancer, pancreatic cancer, colorectal cancer, lung cancer, breastcancer, and prostate cancer.

In some embodiments, the levels of one or more protein biomarkers can bedetected independently (e.g., via singleplex peptide tools). Examples ofmethods for detecting protein levels include, without limitation,spectrometry methods (e.g., high-performance liquid chromatography(HPLC) and liquid chromatography-mass spectrometry (LC/MS)), antibodydependent methods (e.g., enzyme-linked immunosorbent assay (ELISA),protein immunoprecipitation, immunoelectrophoresis, western blotting,and protein immunostaining), and aptamer dependent methods. In someembodiments, the level of one or more protein biomarkers can be detectedas described in the Examples.

Many of the singleplex peptide tools, such as, but not limited to ELISAsor western blotting, can be used sequentially or concurrently to analyzemultiple peptide biomarkers. Multiplex peptide tools can includecombining singleplex peptide tools or elements of singleplex peptidetools. Additionally or alternatively, detecting the levels of one ormore peptide biomarkers can occur via multiplex peptide tools such as“chips,” microarrays, or immunoassay systems. In one non-limitingexample, multiple analytes can be probed by multiple capture antibodiesspotted on microarrays and analyzed via horseradish peroxidase(HRP)-conjugated antibody/chemiluminescence system. In this method, eachspot captures a specific target protein, and a second, target-specificdetector antibody is used for quantification. In addition to membraneantibody arrays, glass slides may be used for quantitative antibodyarrays. Commercial embodiments of multiplexed ELISAs include, but arenot limited to, Q-Plex available from Quansys Biosciences, Mosaic™available from R&D Systems, Ciraplex® available from Aushon Biosystems,MULTI-ARRAY available from Meso Scale Discovery, FAST Quant availablefrom Whatman Schleicher & Schuell BioScience, A² available from BeckmanCoulter, and Quantibody® available from RayBiotech. Additionally oralternatively, multiplex assays can be utilize beads or particles todetect one or more protein biomarkers simultaneously. In onenon-limiting example, polystyrene or paramagnetic beads are impregnatedwith dyes of differing wavelengths are used to detect multiple targetantibodies simultaneously, wherein each dye or dye combinationcorresponds with a different target antibody. Sandwich assays are usedto measure protein levels. Commercial embodiments of this techniqueutilize the Luminex® and FirePlex® technology platforms, e.g., Bio-Plex®Multiplex Immunoassay System available from Bio-Rad, FlowCytomixavailable from eBioscience, ProcartalPlex Immunoassay System availablefrom ThermoFisher Scientific, Novex® Multiplex Assays available fromInvitrogen.

In some embodiments, an assay includes detection of thresholded proteinbiomarkers in a biological sample (e.g., any biological sample describedherein such as, without limitation, blood or plasma) without detectionof genetic biomarkers (e.g., mutations in circulating tumor DNA (ctDNA))and/or aneuploidy and/or an additional class of biomarker. In someembodiments, an assay includes detection of thresholded proteinbiomarkers in a biological sample (e.g., any biological sample describedherein such as, without limitation, blood or plasma) with detection ofgenetic biomarkers (e.g., mutations in circulating tumor DNA (ctDNA))and/or aneuploidy and/or an additional class of biomarker. For example,an assay may include detection of one or more of (e.g., each of) CA19-9,CEA, HGF, OPN, CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO) ina biological sample. In some embodiments, an assay may include detectionof one or more of (e.g., each of) CA19-9, CEA, HGF, OPN, CA125,prolactin, TIMP-1, and/or myeloperoxidase (MPO) in a biological sampleat any of the threshold levels disclosed herein. As another example, anassay may include detection of one or more of (e.g., each of) CA19-9,CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, and/orCA15-3 in a biological sample. In some embodiments, an assay may includedetection of one or more of (e.g., each of) CA19-9, CEA, HGF, OPN,CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, and/or CA15-3 in abiological sample at any of the threshold levels disclosed herein. Asanother example, an assay may include detection of one or more of (e.g.,each of) CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and/orCA15-3 in a biological sample. In some embodiments, an assay may includedetection of one or more of (e.g., each of) CA19-9, CEA, HGF, OPN,CA125, AFP, prolactin, TIMP-1, and/or CA15-3 in a biological sample atany of the threshold levels disclosed herein. As another example, anassay may include detection of one or more of (e.g., each of) KRAS(e.g., codons 12 and 61), TP53, CDKN2A, and/or SMAD4 in a biologicalsample. In some embodiments, an assay may include detection of one ormore of (e.g., each of) KRAS (e.g., codons 12 and 61), TP53, CDKN2A,and/or SMAD4 in a biological sample at any of the threshold levelsdisclosed herein. In some embodiments, once an assay that includesdetection of thresholded protein biomarkers in a biological sample isperformed, subsequent testing or monitoring is performed (e.g., any ofthe variety of further diagnostic testing or increased monitoringtechniques disclosed herein). In some embodiments, once an assay thatincludes detection of thresholded protein biomarkers in a biologicalsample is performed, a second assay that includes detecting the presenceof one or more genetic biomarkers present in cell-free DNA (e.g., ctDNA,e.g., any of the variety of genetic alterations that are present incell-free DNA or ctDNA as described herein), the presence of one or moreprotein biomarkers (e.g., any of the variety of protein biomarkersdescribed herein), the presence of aneuploidy, and/or the presence ofone or more additional classes of biomarkers can be performed.

Examples of protein biomarkers that can be detected using any of thevariety of techniques described herein include, without limitation,Actin gamma (ACTG1), AFP, Alpha-2-HS glycol protein, Angiopoietin-2,apolipo protein 1, AXL, CA125, CA15-3, carbohydrate antigen 19-9(CA19-9), carcinoembryonic antigen (CEA), Catenin, Caveolin-1, CD44,class B member 1 (HSP90AB1), complement c3a, Cyclin D, CYFRA 21-1,Defensin α 6, DKK1, EAFP, Endoglin, Eukaryotic translation elongationfactor 1 gamma (EEF1G), Ferritin, FGF2, follistatin, Galectin-3, G-CSF,GDF15, Glucose regulated protein-8, Glyceraldehyde-3-phosphatedehydrogenase (GAPDH), HE4, Heat shock protein 90 kDa alpha (cytosolic),heavy polypeptide 1 (FTH1), hepatocyte growth factor (HGF), IL-6, IL-8,Kallikrein 6, Lamin A/C filament protein, large subunit, Leptin, lightpolypeptide (FTL), LRG-1, Mesothelin, Midkine, MMPs, muscle (PKM2),Myeloperoxidase, NSE, OPG, osteopontin (OPN), P0 (RPLP0), PAR,prolactin, Pyruvate kinase, Ribosomal protein, Ribosomal protein L3(RPL3), Ribosomal protein large subunit P0 (RPLP0), S100 P, sEGFR, SerumC-peptide, sFas, SHBG, sHER2/sEGFR2/sErbB2, sPECAM-1, TGFa, thioredoxinlike protein-2, Thrombospondin-2, TIMP-2, Transferrin, Translationelongation factor (EEF1A1), Txl-2 (thioredoxin like protein-2),Vitronectin, a defensing-1,-2,-3, and/or α-1 antitrypsin. Exemplaryprotein biomarkers detected in various cancer types are shown in Example2.

Detecting Aneuploidy

Aneuploidy is the presence of an abnormal number of chromosomes in acell. Aneuploidy usually originates during cell division when thechromosomes do not separate properly between the two cells. Aneuploidyoccurs as the result of a weakened mitotic checkpoint, as thesecheckpoints tend to arrest or delay cell division until all componentsof the cell are ready to enter the next phase. If a checkpoint isweakened, the cell may fail to notice that a chromosome pair is notlined up on the mitotic plate. In such a case, most chromosomes wouldseparate normally (with one chromatid ending up in each cell), whileothers could fail to separate at all. This would generate a daughtercell lacking a copy and a daughter cell with an extra copy. Aneuploidyhas been consistently observed in many cancers.

Aneuploidy can be detected through karyotyping, a process in which asample of cells is fixed and stained to create the typical light anddark chromosomal banding pattern and a picture of the chromosomes isanalyzed. Other non-limiting techniques for detecting aneuploidy includee.g., Fluorescence In Situ Hybridization (FISH), quantitative PCR ofShort Tandem Repeats, quantitative fluorescence PCR (QF-PCR),quantitative PCR dosage analysis, Quantitative Mass Spectrometry ofSingle Nucleotide Polymorphisms, Comparative Genomic Hybridization(CGH), microarrays, Sanger sequencing, and massively parallel sequencingmethods, etc.

The present disclosure provides methods to detect aneuploidy. Forexample, the present disclosure provides methods and materials forevaluating sequencing data to identify a mammal as having a diseaseassociated with one or more chromosomal anomalies (e.g., cancer). Thesequencing data can be processed to identify significant singlechromosomal arm gains or losses, as well as allelic imbalance onchromosome arms, using Within-Sample AneupLoidy DetectiOn (WALDO)method. WALDO incorporates a support vector machine (SVM) todiscriminate between aneuploid and euploid samples. The SVM can betrained using aneuploid samples (e.g., synthetic aneuploid samples) andeuploid samples (e.g., peripheral white blood cell (WBC)). A sample canbe scored as positive (aneuploid), if the SVM discriminant scoreexceeded a given threshold. In some embodiments, a single primer pair isused to amplify ˜38,000 loci of long interspersed nucleotide elements(LINEs) throughout the genome. Massively parallel sequencing is thenperformed. In some embodiments, one of the primers include an UID to asa molecular barcode, which can be used to reduce error rates associatedwith PCR and sequencing.

Overview of WALDO

In euploid samples, the number of LINE reads within each 500-kb genomicinterval should track with the number of reads in certain other genomicregions. Genomic intervals that track together do so because theamplicons within them amplify to similar extents. Here, these genomicregions that track together are called “clusters.” Clusters can be fromsequencing data on euploid samples. In a test sample, whether the numberof reads in each genomic interval in each predefined cluster is withinthe expected bound of the other clusters from that same sample isdetermined. If the reads within a genomic interval are outside thestatistically expected bound, and there are many such outsiders on thesame chromosome arm, then that chromosome arm is classified asaneuploidy

In brief, while the number of reads at each LINE is not randomlydistributed across the genome, the distribution of scaled reads withineach cluster is approximately normal. A convenient property of normaldistributions is that the sum of multiple normal distributions is also anormal distribution. The theoretical mean and variance of the summedreads on each chromosome arm can be computed simply by summing the meansand variances of all of the clusters represented on that chromosome arm.

WALDO employs several methods that make it applicable to the analysis ofPCR-generated amplicons from clinical samples. One of these methods iscontrolling amplification bias stemming from the strong dependence ofthe data on the size of the initial template. Another is the use of aSupport Vector Machine (SVM) to enable the detection of aneuploidy insamples containing low neoplastic fractions.

As shown in FIG. 36, a single primer pair is used to amplify LINEs. Atest sample is then matched to several euploid samples with genomic DNAof similar size. The genome is divided into multiple intervals, and eachinterval has a similar size (e.g., 100, 200, 300, 400, 500, 600, 700,800, 900 Kb, or 1 Mb, 2 Mb, 3 Mb, 4 Mb, or 5 Mb). The reads within thesegenomic intervals in the euploid samples are grouped into clusters. Allof the genomic intervals in the clusters have similar read depths. Thereads from each of the genomic intervals in the test sample are placedinto the predefined clusters. Statistical tests, e.g., an SVM-basedalgorithm, are used to determine whether the total reads from all of thegenomic intervals on each chromosome arm are distributed as expected ifthe sample is euploid. The statistical tests are based on the observeddistribution of reads within the clusters of the test sample, not bycomparison with the reads in euploid samples. Germline sequence variantsat sites of known common polymorphisms within the LINEs provideinformation about arm-level allelic imbalance that can also be used toassess aneuploidy of individual chromosome arms. These samepolymorphisms can be used to determine whether any two samples arederived from the same individual. When there is a matched normal samplefrom the same individual available, the methods described herein candetect the number and nature of single base substitutions and insertionsand deletions within the LINEs.

Fast-SeqS

For each DNA sample evaluated, FAST-SeqS can be used to amplifyapproximately 38,000 amplicons with a single primer pair (Kinde I,Papadopoulos N, Kinzler K W, & Vogelstein B (2012) FAST-SeqS: a simpleand efficient method for the detection of aneuploidy by massivelyparallel sequencing. PloS ONE 7(7):e41162). Massively parallelsequencing can be performed. In some embodiments, degenerate bases atthe 5′ end of the primer are used as molecular barcodes to uniquelylabel each DNA template molecule. Thus, each DNA template molecule willbe counted only once. In some embodiments, each unique read can besequenced between 1 and 20 times (e.g., about 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 times)

Sample Alignment and Genomic Interval Grouping

Alignment programs (e.g., Bowtie2) can be used to align reads to humanreference genome assembly (e.g., GRC37). Exact matches to the referencegenome can be identified. These exact matches allow inclusion of commonpolymorphisms. In light of experimental and stochastic variation, thenumber of reads that mapped to each genomic region of any euploid sampleis expected to be variable. In some embodiments, to minimize thisvariability, clusters of genomic intervals with similar read depthacross all chromosomes in multiple euploid samples are identified. Thisstep can estimate the expected variability in read depth in a samplewhen no aneuploidy is present. In some embodiments, the genomic intervalhas a size of 100, 200, 300, 400, 500, 600, 700, 800, 900 Kb, or 1 Mb, 2Mb, 3 Mb, 4 Mb, or 5 Mb. In some embodiments, the genomic interval has asize of 500 kb.

Clustering of the genomic intervals can be performed as follows. Eachtest sample is matched to euploid samples that have similar ampliconsizes. This is important because smaller amplicons can beover-represented in the amplicons generated from DNA that is of smallsize prior to amplification. The euploid samples can be derived from WBCor plasma DNA from normal individuals, collectively termed the “euploidreference set”. The euploid samples can include at least 1, 2, 3, 4, 5,6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140,150, 160, 170, 180, 190, or 200 samples. In some embodiments, theeuploid samples does not include more than 5, 6, 7, 8, 9, or 10 samples.

For each test sample p, the samples with the smallest Euclidean distancetop, defined as

D(p,q)=√{square root over (Σ_(n)(q _(n) −p _(n))²)}

where, pn and qn are the fraction of amplicons of size n in samples pand q, and the sum is over all amplicon sizes in the two samples. Insome embodiments, before calculating the Euclidian distances between thetest samples from the samples in the euploid reference set, thefollowing amplicons are excluded: (i) Using maximum likelihoodestimates, the amplicons are ranked by variance among the euploidsamples and the top 1% excluded. (ii), any amplicons with <10 reads inone sample but>50 reads in any of the other samples are removed. In eachsample, the genomic intervals are scaled by subtracting the mean anddividing by the standard deviation of reads in each sample.

The scaled genomic intervals are then clustered across the selectednormal samples. First, each genomic interval i is assigned to a primarycluster C. Next, the reads in genomic interval i across all samples iscompared to the average number of read in the samples in all othergenomic intervals i′ that occurred on the remaining autosomalchromosomes. If the average number of reads in genomic interval i′ isnot significantly different from the number of reads in genomic intervali, it is added to cluster C_(i). This process is repeated for eachgenomic interval, yielding multiple clusters (e.g., more than 1000,2000, 3000, 4000, or 5000 clusters). Every interval i belongs to itsprimary cluster but the same interval can also belong to some otherclusters. In some embodiments, there are about 4300 clusters.

In some embodiments, scaled reads are not randomly distributed. In someembodiments, the distribution of scaled reads within the genomicintervals in each cluster follows an approximately normal distribution.

Identifying Chromosome Arm Gains or Losses in a Test Sample

The methods described herein can use just a few euploid samples (e.g.,about 2, 3, 4, 5 6, 7, 8, 9, or 10 samples) to define clusters ofgenomic intervals with similar amplification properties. The statisticaltests for aneuploidy are based on the read distributions within the testsample and independently of the read distributions in any euploidsample. In some embodiments, maximum likelihood is used to estimate themeans μ and variances δ² of the genomic intervals in each cluster. Insome embodiments, the robustness of these estimates can be improved byiteratively removing outlying genomic intervals within the test samplefrom the clusters. In some embodiments, clusters containing fewer than10 genomic intervals are not included in the analysis. In someembodiments, for each cluster, any genomic interval meeting the criteria

min(2*CDF(μ,σ_(i) ²),2*(1−CDF(μ,σ_(i) ²))<0.01

is removed from all clusters. Next, the μ and the variances δ²parameters of each cluster is re-estimated by maximum likelihood. Thetwo steps are repeated until no outlying genomic intervals remained. Thestatistical significance of the total reads from all genomic intervalson the arm is estimated. Because sums of normally distributed randomvariables are also normally distributed random variables, thecalculation is straightforward. For each chromosome arm, it can becalculated

Σ₁ ^(l) R _(i) ˜N(Σ₁ ^(l)μ_(i),Σ₁ ^(I)σ_(i) ²)

wherein R_(i) is the scaled reads and i is the number of clusters on thearm. Z-scores can be produced using the quantile function

1−CDF(Σ₁ ^(I)μ_(i),Σ₁ ^(I)σ_(i) ²)

Positive Z-scores>α represents gains and negative Z-scores<−α representslosses. The a value is the selected significance threshold.

Arm Level Allelic Imbalance

Common polymorphisms from 1000 Genomes (including e.g., 24,720 singlenucleotide and 1,500 indels, MAF>1%) can be used as candidateheterozygous sites. For each of the normal samples, polymorphic sitescan be confidently called as heterozygous and diploid. Polymorphisms canbe defined as those with variant-allele frequencies (VAF) (0.4<VAF<0.6),where VAF=#non-reference reads/total reads. VAFs can be modeled at thesesites as random variables taken from a normal distribution with μ=0.5;and the variances δ² can be estimated by maximum likelihood as afunction of read depth.

To determine whether the alleles on a chromosome arm in a test sample isunbalanced, the subset of polymorphic sites at which both alleles arepresent and in which the sum of the reads on both alleles is >25 areidentified.

The observed VAF with the normal distribution, using the expectedvariance for the observed read depth, yielding a two-sided P-value. Allp-values on a chromosome arm can be Z-transformed and combined with aweighted Stouffer's method, with the observed read depth at each siteused as its weight. The formula used for this calculation is

$Z \sim \frac{\sum\limits_{i = 1}^{k}{w_{i}Z_{i}}}{\sqrt{\sum\limits_{i = 1}^{k}w_{i}^{2}}}$

where wi is UID depth at variant i, Zi is the Z-score of variant i, andk is the number of variants observed on the chromosome arm. A chromosomearm is scored as having an allelic imbalance if the resulting Z score isgreater than the selected statistical significance threshold α (e.g., byone-sided test).

Generation of Synthetic Aneuploid Samples

Synthetic aneuploid samples can be created by adding (or subtracting)reads from several chromosome arms to the reads from these normal DNAsamples. The reads from 1, 5, 10, 15, 20, or 25 randomly selectedchromosome arms are added or subtracted to each sample. The additionsand subtractions are designed to represent neoplastic cell fractionsranging from 0.5% to 10% and result in synthetic samples containingexactly nine million reads. The reads from each chromosome arm can beadded or subtracted uniformly. These synthetically generated samples inwhich reads from only a single chromosome arm are added or subtractedcan be used estimate the performance of WALDO.

Genome-Wide Aneuploidy Detection

The present disclosure provides genome-wide aneuploidy detections. Insome embodiments, a two-class support vector machine (SVM) can betrained to discriminate between euploid samples and the syntheticsamples. The training set can contain white blood cells (WBC) samplesand samples with aneuploidy (e.g., all synthetic samples).

SVM training can be done by various statistical software (e.g., in R,using radial basis kernel and default parameters).

The number of reads from the data on experimental samples can varywidely, particularly when the samples are derived from sources withlimited amounts of DNA such as plasma. In some embodiments, samples withlow reads can generate artificially high SVM scores if read depth is nottaken into account. Thus, read depth can be controlled by modeling thechange in SVM scores as a function of read depth in the normal samples.The average ratio r at each depth decreased monotonically as a functionof increasing read depth. The relation between read depth and SVM scorecan be modeled using the following equation. Thus, raw SVM scores can becorrected by dividing by the ratio r, using the formula

${\log \left( {1 - \frac{1}{r}} \right)} = {{Ax} + {B.}}$

To score a sample as aneuploid, whether any single chromosome arm in itis lost or gained in a statistically significant manner is determined. Astatistically significant gain of a single chromosome arm is defined asone whose Z-score is above the maximum Z-score observed in the normalsamples (e.g., 1σ, 2σ, 3σ, 4σ, or 5σ above). Similarly, a statisticallysignificant loss of a single chromosome arm is defined as one whoseZ-score is below the minimum Z-score observed in the normal samples(e.g., 1 σ, 2 σ, 3 σ, 4 σ, or 5 σ below). Allelic imbalance based onSNPs can be defined for a chromosome arm whose Z-score is above themaximum Z-score observed in normal samples (e.g., la, 2a, 3a, 4a, or 5aabove). Only samples in which no single chromosome arm is gained or lostwhen defined in this way are subjected to SVM analysis. The rationalefor this process is that the SVM is designed to identify samples withlarge numbers of chromosome arm gains or losses but relatively lowneoplastic cell fractions. In some embodiments, the SVM is not designedto detect aneuploidy in samples with neoplastic cell fractions>10%,which are easily identified through evaluation of their Z-scores andcomparison to the normal samples.

Somatic Sequence Mutations and Microsatellite Instability (MSI)

The present disclosure also provides methods to detect somatic sequencemutations and microsatellite instability. When matched normal samplesare available, somatic single base substitution (SBS), insertion anddeletion (indel) mutations can be detected based on LINE ampliconsequences and alignments. In some embodiments, the molecular barcodingapproach for error reduction is used. In some embodiments, the SBSmutations can be identified by directly comparing amplicons from thetest sample with amplicons from the matched normal, and do not requireany alignment to the reference genome.

Indels can be called in a similar way. Amplicons are aligned from thetest sample and matched normal sample to the reference genome (GRc37).In some embodiments, a somatic indel can be at least ten reads from thetest sample differed from any normal read by virtue of the sameinsertion or deletion.

Microsatellite instability in a test sample can determined by countingthe number of somatic indels in mononucleotide tracts of >3 nucleotides.Somatic indels in monotracts can be rare in a normal sample. Therefore,the null distribution of counts can be modeled as Poisson (λ=1), where λis the mean number of somatic indels in a monotract in a normal sample.A sample is determined as harboring MSI if the number of somatic indelsis statistically significant. To evaluate how often normal samples canbe scored as MSI using this process, the total reads in normal samplescan be randomly split into two equal partitions. The first partition canbe used as the reference sample and the second partition can be used asa test sample.

This document provides methods and materials for identifying one or morechromosomal anomalies (e.g., aneuploidies) in a sample. For example, amammal (e.g., a sample obtained from a mammal) can be assessed for thepresence or absence of one or more chromosomal anomalies. In some cases,this document provides methods and materials for using amplicon-basedsequencing data to identify a mammal as having a disease associated withone or more chromosomal anomalies (e.g., cancer). For example, themethods and materials described herein can be applied to a sampleobtained from a mammal to identify the mammal as having one or morechromosomal anomalies. For example, methods and materials describedherein can be applied to a sample obtained from a mammal to identify themammal as having a disease associated with one or more chromosomalanomalies (e.g., cancer). This document also provides methods andmaterials for identifying and/or treating a disease or disorderassociated with one or more chromosomal anomalies (e.g., one or morechromosomal anomalies identified as described herein). In some cases,the one or more chromosomal anomalies can be identified in DNA (e.g.,genomic DNA) obtained from a sample obtained from a mammal. For example,a prenatal mammal (e.g., prenatal human) can be identified as having adisease or disorder based, at least in part, on the presence of one ormore chromosomal anomalies, and, optionally, can be treated with one ormore treatments for the disease or disorder. For example, a mammalidentified as having cancer based, at least in part, on the presence ofone or more chromosomal anomalies can be treated with one or more cancertreatments.

Any appropriate mammal can be assessed and/or treated as describedherein. A mammal can be a prenatal mammal (e.g., prenatal human). Amammal can be a mammal suspected of having a disease associated with oneor more chromosomal anomalies (e.g., cancer). In some cases, humans orother primates such as monkeys can be assessed for the presence of oneor more chromosomal anomalies as described herein. In some cases, dogs,cats, horses, cows, pigs, sheep, mice, and rats can be assessed for thepresence of one or more chromosomal anomalies as described herein. Forexample, a human can be assessed for the presence of one or morechromosomal anomalies as described herein and, optionally, can betreated with one or more cancer treatments as described herein.

Any appropriate sample from a mammal can be assessed as described herein(e.g., assessed for the presence of one or more chromosomal anomalies).A sample can include genomic DNA. In some cases, a sample can includecell-free circulating DNA (e.g., cell-free circulating fetal DNA). Insome cases, a sample can include circulating tumor DNA (ctDNA). Examplesof samples that can contain DNA include, without limitation, blood(e.g., whole blood, serum, or plasma), amnion, tissue, urine,cerebrospinal fluid, saliva, sputum, broncho-alveolar lavage, bile,lymphatic fluid, cyst fluid, stool, ascites, pap smears, cerebral spinalfluid, endo-cervical, endometrial, and fallopian samples. For example, asample can be a plasma sample. For example, a sample can be a urinesample. For example, a sample can be a saliva sample. For example, asample can be a cyst fluid sample. For example, a sample can be a sputumsample. In some cases, a sample can include a neoplastic cell fraction(e.g., a low neoplastic cell fraction). In cases where a sample includesa low neoplastic cell fraction, the neoplastic cell fraction can be fromabout 0.01% to about 10% (e.g., from about 0.05% to about 10%, fromabout 0.5% to about 10%, from about 1% to about 10%, from about 3% toabout 10%, from about 5% to about 10%, from about 7% to about 10%, fromabout 0.01% to about 8%, from about 0.01% to about 5%, from about 0.01%to about 2%, from about 0.01% to about 1%, from about 0.01% to about0.5%, from about 0.05% to about 8%, from about 0.1% to about 4%, or fromabout 0.5% to about 1%) of the cell content of the entire sample. Forexample, a sample that includes a low neoplastic cell fraction can beabout 1% neoplastic cells. For example, a sample that includes a lowneoplastic cell fraction can be about 0.5% neoplastic cells.

In some cases, a sample can be processed to isolate and/or purify DNAfrom the sample. In some cases, DNA isolation and/or purification caninclude cell lysis (e.g., using detergents and/or surfactants). In somecases, DNA isolation and/or purification can include removing proteins(e.g., using a protease). In some cases, DNA isolation and/orpurification can include removing RNA (e.g., using an RNase).

Methods and materials for identifying one or more chromosomal anomaliescan include assessing a genome (e.g., a genome of a mammal) for thepresence or absence of one or more chromosomal anomalies (e.g.,aneuploidies). The presence or absence of one or more chromosomalanomalies in the genome of a mammal can, for example, be determined bysequencing a plurality of amplicons obtained from a sample obtained fromthe mammal to obtain sequencing reads, and grouping the sequencing readsinto clusters of genomic intervals. In some cases, read counts ofgenomic intervals can be compared to read counts of other genomicintervals within the same sample. In some cases where read counts ofgenomic intervals are compared to read counts of other genomic intervalswithin the same sample, a second (e.g., control or reference) sample isnot assayed. For example, when using methods and materials describedherein to identify numerical disorders (e.g., aneuploidy) and/orstructural abnormalities, genomic intervals can be compared to readcounts of other genomic intervals within the same sample. In some cases,read counts of genomic intervals can be compared to read counts ofgenomic intervals in another sample. For example, when using the methodsand materials described herein to identify genetic relatedness,polymorphisms (e.g., somatic mutations), and/or microsatelliteinstability, genomic intervals can be compared to read counts of genomicintervals in a reference sample. A reference sample can be a syntheticsample. A reference sample can be from a database. In some cases wherethe methods and materials described herein are used to identify geneticrelatedness, a reference sample can be a forensic sample. In some caseswhere methods and materials described herein are used to identifygenetic relatedness, a reference sample can be obtained from suspectedrelation. In some cases where the methods and materials described hereinare used to identify, anomalies (e.g., aneuploidies), one or morepolymorphisms (e.g., somatic mutations), and/or microsatelliteinstability, a reference sample can be a normal sample obtained from thesame cancer patient (e.g., a sample from the cancer patient that doesnot harbor cancer cells) or a normal sample from another source (e.g., apatient that does not have cancer).

In some cases, methods and materials described herein can be used fordetecting aneuploidy in a genome of mammal. For example, a plurality ofamplicons obtained from a sample obtained from a mammal can besequenced, the sequencing reads can be grouped into clusters of genomicintervals, the sums of the distributions of the sequencing reads in eachgenomic interval can be calculated, a Z-score of a chromosome arm can becalculated, and the presence or absence of an aneuploidy in the genomeof the mammal can be identified. The distributions of the sequencingreads in each genomic interval can be summed. For example, sums ofdistributions of the sequencing reads in each genomic interval can becalculated using the equation Σ₁ ^(I)R_(i)˜N(Σ₁ ^(I)μ_(i), Σ₁ ^(I)σ_(i)²), where R_(i) is the number of sequencing reads, I is the number ofclusters on a chromosome arm, N is a Gaussian distribution withparameters μ_(i) and σ_(i) ², μ_(i) is the mean number of sequencingreads in each genomic interval, and σ_(i) ² is the variance ofsequencing reads in each genomic interval. a Z-score of a chromosome armcan be calculated using any appropriate technique. For example, aZ-score of a chromosome arm can be calculated using the quantilefunction 1−CDF(Σ₁ ^(I)μ_(i), Σ₁ ^(I)σ_(i) ²). The presence of ananeuploidy in the genome of the mammal can be identified in the genomeof the mammal when the Z-score is outside a predetermined significancethreshold, and the absence of an aneuploidy in the genome of the mammalcan be identified in the genome of the mammal when the Z-score is withina predetermined significance threshold. The predetermined threshold cancorrespond to the confidence in the test and the acceptable number offalse positives. For example, a significance threshold can be ±1.96, ±3,or ±5.

In some cases, methods and materials described herein can be used fordetecting one or more polymorphisms in a genome of a mammal. Forexample, a plurality of amplicons obtained from a sample obtained from afirst mammal (e.g., a test mammal or a mammal suspected of harboring oneor more polymorphisms) can be sequenced, a plurality of ampliconsobtained from a sample obtained from a second mammal (e.g., a referencemammal) can be sequenced, variant sequencing reads from the sampleobtained from the first mammal can be grouped into clusters of genomicintervals, reference sequencing reads from the sample obtained from thesecond mammal can be grouped into clusters of genomic intervals, achromosome arm having a sum of the variant sequencing reads and thereference sequencing reads on both alleles that is greater than about 3(e.g., greater than about 4, greater than about 5, greater than about 6,greater than about 7, greater than about 8, greater than about 9,greater than about 10, greater than about 12, greater than about 15,greater than about 18, greater than about 20, greater than about 22,greater than about 25, or greater than about 30) can be selected, avariant-allele frequency (VAF) of the selected chromosome arm can bedetermined, and the presence or absence of one or more polymorphisms onthe selected chromosome arm can be identified. A VAF of the selectedchromosome arm can be determined using any appropriate technique. Forexample, a VAF of the selected chromosome arm can be the number ofvariant sequencing reads/total number of sequencing reads. The presenceof one or more polymorphisms in the genome of the mammal can beidentified in the genome of the mammal when the VAF is between about 0.2and about 0.8 (e.g., between about 0.3 and about 0.8, between about 0.4and about 0.8, between about 0.5 and about 0.8, between about 0.6 andabout 0.8, between about 0.2 and about 0.7, between about 0.2 and about0.6, between about 0.2 and about 0.5, or between about 0.2 and about0.4), and the absence of one or more polymorphisms in the genome of themammal can be identified in the genome of the mammal when the VAF iswithin a predetermined significance threshold. As one non-limitingexample, the presence of one or more polymorphisms in the genome of themammal can be identified in the genome of the mammal when the VAF isbetween about 0.4 and 0.6.

Methods and materials for identifying one or more chromosomal anomaliesas described herein can include using amplicon-based sequencing reads.For example, a plurality of amplicons (e.g., amplicons obtained from asample obtained from the mammal) can be sequenced. In some cases, eachamplicon can be sequenced between about 1 and about 20 (e.g., betweenabout 1 and about 15, between about 1 and about 12, between about 1 andabout 10, between about 1 and about 8, between about 1 and about 5,between about 5 and about 20, between about 7 and about 20, betweenabout 10 and about 20, between about 13 and about 20, between about 3and about 18, between about 5 and about 16, or between about 8 and about12) times. In some cases, amplicon-based sequencing reads can includecontinuous sequencing reads. In some cases, amplicons can include longinterspersed nucleotide elements (LINEs). In some cases, amplicon-basedsequencing reads can include from about 100,000 to about 25 million(e.g., from about 100,000 to about 20 million, from about 100,000 toabout 15 million, from about 100,000 to about 12 million, from about100,000 to about 10 million, from about 100,000 to about 5 million, fromabout 100,000 to about 1 million, from about 100,000 to about 750,000,from about 100,000 to about 500,000, from about 100,000 to about250,000, from about 250,000 to about 25 million, from about 500,000 toabout 25 million, from about 750,000 to about 25 million, from about 1million to about 25 million, from about 5 million to about 25 million,from about 10 million to about 25 million, from about 15 million toabout 25 million, from about 200,000 to about 20 million, from about250,000 to about 15 million, from about 500,000 to about 10 million,from about 750,000 to about 5 million, or from about 1 million to about2 million) sequencing reads. In some cases, methods of sequencingamplicons include, without limitation, a Fast Aneuploidy ScreeningTest-Sequencing System (FAST-SeqS). For example, sequencing a pluralityof amplicons can include assigning a unique identifier (UID) to eachtemplate molecule (e.g., to each amplicon), amplifying each uniquelytagged template molecule to create UID-families, and redundantlysequencing the amplification products. For example, sequencing aplurality of amplicons can include calculating a Z-score of a variant onsaid selected chromosome arm using the equation

${Z \sim \frac{\sum\limits_{i = 1}^{k}{w_{i}Z_{i}}}{\sqrt{\sum\limits_{i = 1}^{k}w_{i}^{2}}}},$

where w_(i) is UID depth at a variant i, Z_(i) is the Z-score of varianti, and k is the number of variants observed on the chromosome arm. Insome cases, methods of sequencing amplicons can be as described inExample 6. In some cases, methods of sequencing amplicons can be asdescribed elsewhere (see, e.g., US 2015/0051085; and Kinde et al. 2012PloS ONE 7:e41162).

In some cases, methods and materials for identifying one or morechromosomal anomalies (e.g., aneuploidies) as described herein caninclude amplification of a plurality of amplicons. For example,amplification of a plurality of amplicons can be performed using asingle primer pair. Methods of amplifying a plurality amplicons include,without limitation, polymerase chain reaction (PCR) assays.

A plurality of amplicons can include any appropriate number ofamplicons. In some cases, a plurality of amplicons can include fromabout 10,000 to about 1,000,000 (e.g., from about 15,000 to about1,000,000, from about 25,000 to about 1,000,000, from about 35,000 toabout 1,000,000, from about 50,000 to about 1,000,000, from about 75,000to about 1,000,000, from about 100,000 to about 1,000,000, from about125,000 to about 1,000,000, from about 160,000 to about 1,000,000, fromabout 180,000 to about 1,000,000, from about 200,000 to about 1,000,000,from about 300,000 to about 1,000,000, from about 500,000 to about1,000,000, from about 750,000 to about 1,000,000, from about 10,000 toabout 800,000, from about 10,000 to about 500,000, from about 10,000 toabout 250,000, from about 10,000 to about 150,000, from about 10,000 toabout 100,000, from about 10,000 to about 75,000, from about 10,000 toabout 50,000, from about 10,000 to about 40,000, from about 10,000 toabout 30,000, or from about 10,000 to about 20,000) amplicons. As onenon-limiting example, a plurality of amplicons can include about 38,000amplicons. Amplicons in a plurality of amplicons can be any appropriatelength. In some cases, an amplicon can include from about 50 to about140 (e.g., from about 60 to about 140, from about 76 to about 140, fromabout 90 to about 140, from about 100 to about 140, from about 130 toabout 140, from about 50 to about 130, from about 50 to about 120, fromabout 50 to about 110, from about 50 to about 100, from about 50 toabout 90, from about 50 to about 80, from about 60 to about 130, fromabout 70 to about 125, from about 80 to about 120, or from about 90 toabout 100) nucleotides. As one non-limiting example, an amplicon caninclude about 100 nucleotides.

Methods and materials for identifying one or more chromosomal anomaliesas described herein can include grouping sequencing reads (e.g., from aplurality of amplicons) into clusters (e.g., unique clusters) of genomicintervals. A genomic interval can be included in one or more clusters.In some cases, a genomic interval can belong to from about 100 to about252 (e.g., from about 125 to about 252, from about 150 to about 252,from about 175 to about 252, from about 200 to about 252, from about 225to about 252, from about 100 to about 250, from about 100 to about 225,from about 100 to about 200, from about 100 to about 175, from about 100to about 150, from about 125 to about 225, from about 150 to about 200,or from about 160 to about 180) clusters. As one non-limiting example, agenomic interval can belong to about 176 clusters. Each cluster caninclude any appropriate number of genomic intervals. In some cases, eachcluster can include the same number of genomic intervals. In some cases,cluster can include varying numbers of genomic clusters. As onenon-limiting example, each cluster can include about 200 genomicintervals.

A cluster of genomic intervals can include any appropriate number ofgenomic intervals. In some cases, a cluster of genomic intervals caninclude from about 4000 to about 4500 (e.g., from about 4100 to about4500, from about 4200 to about 4500, from about 4300 to about 4500, fromabout 4400 to about 4500, from about 4000 to about 4400, from about 4000to about 4300, from about 4000 to about 4200, from about 4000 to about4100, from about 4100 to about 4400, or from about 4200 to about 4300)genomic intervals. As one non-limiting example, a cluster of genomicintervals can include about 4361 genomic intervals. A genomic intervalcan be any appropriate length. For example, a genomic interval can bethe length of an amplicon sequenced as described herein. For example, agenomic interval can be the length of a chromosome arm. In some cases, agenomic interval can include from about 100 to about 125,000,000 (e.g.,from about 250 to about 125,000,000, from about 500 to about125,000,000, from about 750 to about 125,000,000, from about 1,000 toabout 125,000,000, from about 1,500 to about 125,000,000, from about2,000 to about 125,000,000, from about 5,000 to about 125,000,000, fromabout 7,500 to about 125,000,000, from about 10,000 to about125,000,000, from about 25,000 to about 125,000,000, from about 50,000to about 125,000,000, from about 100,000 to about 125,000,000, fromabout 250,000 to about 125,000,000, from about 500,000 to about125,000,000, from about 100 to about 1,000,000, from about 100 to about750,000, from about 100 to about 500,000, from about 100 to about250,000, from about 100 to about 100,000, from about 100 to about50,000, from about 100 to about 25,000, from about 100 to about 10,000,from about 100 to about 5,000, from about 100 to about 2,500, from about100 to about 1,000, from about 100 to about 750, from about 100 to about500, from about 100 to about 250, from about 500 to about 1,000,000,from about 5000 to about 900,000, from about 50,000 to about 800,000, orfrom about 100,000 to about 750,000) nucleotides. As one non-limitingexample, a genomic interval can include about 500,000 nucleotides.Clusters of genomic intervals can be formed using any appropriatemethod. For example, amplicons of similar size can be clustered. In somecases, clusters of genomic intervals can be formed as described inExample 6.

Methods and materials described herein also can employ supervisedmachine learning. In some cases, supervised machine learning can detectsmall changes in one or more chromosome arms. For example, supervisedmachine learning can detect changes such as chromosome arm gains orlosses that are often present in a disease or disorder associated withchromosomal anomalies, such as cancer. In some cases, supervised machinelearning can be used to classify samples according to aneuploidy status.For example, supervised machine learning can be employed to makegenome-wide aneuploidy calls. In some cases, a support vector machinemodel can include obtaining an SVM score. An SVM score can be obtainedusing any appropriate technique. In some cases, an SVM score can beobtained as described elsewhere (see, e.g., Cortes 1995 Machine learning20:273-297; and Meyer et al. 2015 R package version:1.6-3). At lowerread depths, a sample will typically have a higher raw SVM score. Thus,in some cases, raw SVM probabilities can be corrected based on the readdepth of a sample using the equation log

${\left( {1 - \frac{1}{r}} \right) = {{Ax} + B}},$

where r is the ratio of the SVM score at a particular read depth/minimumSVM score of a particular sample given sufficient read depth. A and Bcan be determined as described in Example 6. For example,A=−7.076*10{circumflex over ( )}−7, x=the number of unique templatemolecules for the given sample, and B=−1.946*10{circumflex over ( )}−1.

Methods and materials described herein can be used to identify anyappropriate chromosomal anomaly. Examples of chromosomal anomaliesinclude, without limitation, numerical disorders, structuralabnormalities, allelic imbalances, and microsatellite instabilities. Achromosomal anomaly can include a numerical disorder. For example, achromosomal anomaly can include an aneuploidy (e.g., an abnormal numberof chromosomes). In some cases, an aneuploidy can include an entirechromosome. In some cases, an aneuploidy can include part of achromosome (e.g., a chromosome arm gain or a chromosome arm loss).Examples of aneuploidies include, without limitation, monosomy, trisomy,tetrasomy, and pentasomy. A chromosomal anomaly can include a structuralabnormality. Examples of structural abnormalities include, withoutlimitation, deletions, duplications, translocations (e.g., reciprocaltranslocations and Robertsonian translocations), inversions, insertions,rings, and isochromosomes. Chromosomal anomalies can occur on anychromosome pair (e.g., chromosome 1, chromosome 2, chromosome 3,chromosome 4, chromosome 5, chromosome 6, chromosome 7, chromosome 8,chromosome 9, chromosome 10, chromosome 11, chromosome 12, chromosome13, chromosome 14, chromosome 15, chromosome 16, chromosome 17,chromosome 18, chromosome 19, chromosome 20, chromosome 21, chromosome22, and/or one of the sex chromosomes (e.g., an X chromosome or a Ychromosome). For example, aneuploidy can occur, without limitation, inchromosome 13 (e.g., trisomy 13), chromosome 16 (e.g., trisomy 16),chromosome 18 (e.g., trisomy 18), chromosome 21 (e.g., trisomy 21),and/or the sex chromosomes (e.g., X chromosome monosomy; sex chromosometrisomy such as XXX, XXY, and XYY; sex chromosome tetrasomy such as XXXXand XXYY; and sex chromosome pentasomy such as XXXXX, XXXXY, and XYYYY).For example, structural abnormalities can occur, without limitation, inchromosome 4 (e.g., partial deletion of the short arm of chromosome 4),chromosome 11 (e.g., a terminal 11q deletion), chromosome 13 (e.g.,Robertsonian translocation at chromosome 13), chromosome 14 (e.g.,Robertsonian translocation at chromosome 14), chromosome 15 (e.g.,Robertsonian translocation at chromosome 15), chromosome 17 (e.g.,duplication of the gene encoding peripheral myelin protein 22),chromosome 21 (e.g., Robertsonian translocation at chromosome 21), andchromosome 22 (e.g., Robertsonian translocation at chromosome 22).

Methods and materials described herein can be used for identifyingand/or treating a disease associated with one or more chromosomalanomalies (e.g., one or more chromosomal anomalies identified asdescribed herein, such as, without limitation, an aneuploidy). In somecases, a DNA sample (e.g., a genomic DNA sample) obtained from a mammalcan be assessed for the presence or absence of one or more chromosomalanomalies. For example, a prenatal mammal (e.g., prenatal human) can beidentified as having a disease based, at least in part, on the presenceof one or more chromosomal anomalies can be treated with one or morecancer treatments. As another example, a mammal identified as havingcancer based, at least in part, on the presence of one or morechromosomal anomalies can be treated with one or more cancer treatments.

In some cases, a mammal identified as having a disease associated withone or more chromosomal anomalies as described herein (e.g., based atleast in part on the presence of one or more chromosomal anomalies, suchas, without limitation, an aneuploidy) can have the disease diagnosisconfirmed using any appropriate method. Examples of methods that can beused to confirm the presence of one or more chromosomal anomaliesinclude, without limitation, karyotyping, fluorescence in situhybridization (FISH), quantitative PCR of short tandem repeats,quantitative fluorescence PCR (QF-PCR), quantitative PCR dosageanalysis, quantitative mass spectrometry of SNPs, comparative genomichybridization (CGH), whole genome sequencing, and exome sequencing.

Once identified as having a disease associated with one or morechromosomal anomalies as described herein (e.g., based at least in parton the presence of one or more chromosomal anomalies, such as, withoutlimitation, an aneuploidy), a mammal can be treated accordingly. Forexample, when a mammal is identified as having a cancer associated withone or more chromosomal anomalies as described herein, the mammal can betreated with one or more cancer treatments. The one or more cancertreatments can include any appropriate cancer treatments. A cancertreatment can include surgery. A cancer treatment can include radiationtherapy. A cancer treatment can include administration of apharmacotherapy such chemotherapy, hormone therapy, targeted therapy,and/or cytotoxic therapy. Examples of cancer treatments include, withoutlimitation, platinum compounds (such as cisplatin or carboplatin),taxanes (such as paclitaxel or docetaxel), albumin bound paclitaxel(nab-paclitaxel), altretamine, capecitabine, cyclophosphamide, etoposide(vp-16), gemcitabine, ifosfamide, irinotecan (cpt-11), liposomaldoxorubicin, melphalan, pemetrexed, topotecan, vinorelbine,luteinizing-hormone-releasing hormone (LHRH) agonists (such as goserelinand leuprolide), anti-estrogen therapy (such as tamoxifen), aromataseinhibitors (such as letrozole, anastrozole, and exemestane),angiogenesis inhibitors (such as bevacizumab), poly(ADP)-ribosepolymerase (PARP) inhibitors (such as olaparib, rucaparib, andniraparib), external beam radiation therapy, brachytherapy, radioactivephosphorus, and any combinations thereof.

Any appropriate disease associated with one or more chromosomalanomalies as described herein (e.g., based at least in part on thepresence of one or more chromosomal anomalies, such as, withoutlimitation, an aneuploidy) can be identified and/or treated as describedherein. Examples of diseases and conditions that can be associated withone or more chromosomal anomalies include, without limitation, lungcancer (e.g., small cell lung carcinoma or non-small cell lungcarcinoma), papillary thyroid cancer, medullary thyroid cancer,differentiated thyroid cancer, recurrent thyroid cancer, refractorydifferentiated thyroid cancer, lung adenocarcinoma, bronchioles lungcell carcinoma, multiple endocrine neoplasia type 2A or 2B (MEN2A orMEN2B, respectively), pheochromocytoma, parathyroid hyperplasia, breastcancer, colorectal cancer (e.g., metastatic colorectal cancer),papillary renal cell carcinoma, ganglioneuromatosis of the gastroentericmucosa, inflammatory myofibroblastic tumor, or cervical cancer, acutelymphoblastic leukemia (ALL), acute myeloid leukemia (AML), cancer inadolescents, adrenal cancer, adrenocortical carcinoma, anal cancer,appendix cancer, astrocytoma, atypical teratoid/rhabdoid tumor, basalcell carcinoma, bile duct cancer, bladder cancer, bone cancer, brainstem glioma, brain tumor, breast cancer, bronchial tumor, Burkittlymphoma, carcinoid tumor, unknown primary carcinoma, cardiac tumors,cervical cancer, childhood cancers, chordoma, chronic lymphocyticleukemia (CLL), chronic myelogenous leukemia (CML), chronicmyeloproliferative neoplasms, colon cancer, colorectal cancer,craniopharyngioma, cutaneous T-cell lymphoma, bile duct cancer, ductalcarcinoma in situ, embryonal tumors, endometrial cancer, ependymoma,esophageal cancer, esthesioneuroblastoma, Ewing sarcoma, extracranialgerm cell tumor, extragonadal germ cell tumor, extrahepatic bile ductcancer, eye cancer, fallopian tube cancer, fibrous histiocytoma of bone,gallbladder cancer, gastric cancer, gastrointestinal carcinoid tumor,gastrointestinal stromal tumors (GIST), germ cell tumor, gestationaltrophoblastic disease, glioma, hairy cell tumor, hairy cell leukemia,head and neck cancer, heart cancer, hepatocellular cancer,histiocytosis, Hodgkin's lymphoma, hypopharyngeal cancer, intraocularmelanoma, islet cell tumors, pancreatic neuroendocrine tumors, Kaposisarcoma, kidney cancer, Langerhans cell histiocytosis, laryngeal cancer,leukemia, lip and oral cavity cancer, liver cancer, lung cancer,lymphoma, macroglobulinemia, malignant fibrous histiocytoma of bone,osteocarcinoma, melanoma, Merkel cell carcinoma, mesothelioma,metastatic squamous neck cancer, midline tract carcinoma, mouth cancer,multiple endocrine neoplasia syndromes, multiple myeloma, mycosisfungoides, myelodysplastic syndromes, myelodysplastic/myeloproliferativeneoplasms, myelogenous leukemia, myeloid leukemia, multiple myeloma,myeloproliferative neoplasms, nasal cavity and paranasal sinus cancer,nasopharyngeal cancer, neuroblastoma, non-Hodgkin's lymphoma, non-smallcell lung cancer, oral cancer, oral cavity cancer, lip cancer,oropharyngeal cancer, osteosarcoma, ovarian cancer, pancreatic cancer,hepatobiliary cancer, upper urinary tract cancer, papillomatosis,paraganglioma, paranasal sinus and nasal cavity cancer, parathyroidcancer, penile cancer, pharyngeal cancer, pheochromosytoma, pituitarycancer, plasma cell neoplasm, pleuropulmonary blastoma, pregnancy andbreast cancer, primary central nervous system lymphoma, primaryperitoneal cancer, prostate cancer, rectal cancer, renal cell cancer,retinoblastoma, rhabdomyosarcoma, salivary gland cancer, sarcoma, Sezarysyndrome, skin cancer, small cell lung cancer, small intestine cancer,soft tissue sarcoma, squamous cell carcinoma, squamous neck cancer,stomach cancer, T-cell lymphoma, testicular cancer, throat cancer,thymoma and thymic carcinoma, thyroid cancer, transitional cell cancerof the renal pelvis and ureter, unknown primary carcinoma, urethralcancer, uterine cancer, uterine sarcoma, vaginal cancer, vulvar cancer,Waldenstrom Macroglobulinemia, Wilms' tumor, 1p36 deletion syndrome,1q21.1 deletion syndrome, 2q37 deletion syndrome, Wolf-Hirschhornsyndrome, Cri du chat, 5q deletion syndrome, Williams syndrome, Monosomy8p, Monosomy 8q, Alfi's syndrome, Kleefstra syndrome, Monosomy 10p,Monosomy 10q, Jacobsen syndrome, Patau syndrome, Angelman syndrome,Prader-Willi syndrome, Miller-Dieker syndrome, Smith-Magenis syndrome,Edwards syndrome, Down syndrome, DiGeorge syndrome, Phelan-McDermidsyndrome, 22q11.2 distal deletion syndrome, Cat eye syndrome, XYYsyndrome, Triple X syndrome, Klinefelter syndrome, Wolf-Hirschhornsyndrome, Jacobsen syndrome, Charcot-Marie-Tooth disease type 1A, andLynch Syndrome.

In some embodiments, methods provided herein can be used to detectaneuploidy (e.g., monosomy or trisomy) in a sample (e.g., a cervicalsample, an endometrial sample, or a urine sample) obtained from asubject. Aneuploidy can be detected in any region of the genome that isknown to be associated with cancer (e.g., endometrial or ovariancancer). In some embodiments, aneuploidy can be detected in arms 4p, 7q,8q, and/or 9q. Each of these arms harbors oncogenes and tumor suppressorgenes that have been shown to undergo copy number alterations in manycancers, including endometrial or ovarian cancer. In some embodiments,aneuploidy can be detected in arms 5q, 8q, and/or 9p. Each of these armsharbors oncogenes and tumor suppressor genes that have been shown toundergo copy number alterations in many cancers, including bladdercancer. Other appropriate regions for aneuploidy detection, whichaneuploidy regions(s) are associated with the presence of cancer in asubject, will be known to those of ordinary skill in the art.

In some embodiments, aneuploidy can be detected by amplifyinginterspersed nucleotide elements. For example, aneuploidy can bedetected by amplifying long interspersed nucleotide elements (LINEs).Additionally or alternatively, aneuploidy can be detected by amplifyingshort interspersed nucleotide elements (SINEs). In some embodiments,aneuploidy can be detected using a technique in which a single PCR isused to co-amplify a plurality of members (e.g., ˜38,000) of a subfamilyof long interspersed nucleotide element-1 (L1 retrotransposons, alsocalled LINEs). L1 retrotransposons, like other human repeats, havespread throughout the genome via retrotransposition and are found on all39 non-acrocentric autosomal arms. In some embodiments, aneuploidy canbe detected by any of the variety of methods disclosed in PatentCooperation Treaty application publication number WO2013148496, thecontents of which are incorporated herein by reference in theirentirety. Those of ordinary skill in the art will be aware of othersuitable methods for detecting aneuploidy. In some embodiments, thesample for detecting aneuploidy is collected using a Pap brush. In someembodiments, the sample for detecting aneuploidy is collected using aTao brush.

In some embodiments, methods provided herein to detect aneuploidy can becombined with methods to detect the presence of one or more geneticbiomarkers (e.g., mutations) in one or more genes selected from thegroup consisting of: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and thedetection of genetic biomarkers (e.g., mutations) present in ctDNA, orboth (e.g., to determine the presence of an ovarian or endometrialcancer). In some embodiments, methods provided herein to detectaneuploidy can be combined with methods to detect the presence of one ormore genetic biomarkers (e.g., mutations) in one or more genes selectedfrom the group consisting of: PTEN, TP53, PIK3CA, PIK3R1, CTNNB1, KRAS,FGFR2, POLE, APC, FBXW7, RNF43, and PPP2R1A, and the detection ofgenetic biomarkers (e.g., mutations) present in ctDNA, or both (e.g., todetermine the presence of an endometrial cancer). In some embodiments,methods provided herein to detect aneuploidy can be combined withmethods to detect the presence of one or more genetic biomarkers (e.g.,mutations) in TP53, and the detection of genetic biomarkers (e.g.,mutations) present in ctDNA, or both (e.g., to determine the presence ofan ovarian cancer). In some embodiments, combining the detection ofaneuploidy with the detection of one or more genetic biomarkers (e.g.,mutations) in any of the genes described herein, the detection ofgenetic biomarkers (e.g., mutations) present in ctDNA, or both canincrease the specificity and/or sensitivity of detecting ovarian orendometrial cancer. In some embodiments, the sample is collected using aPap brush. In some embodiments, the sample is collected using a Taobrush.

Cancers

In some embodiments, methods provided herein can be used to detect thepresence of cancer (e.g., the presence of a cancer cell) in a subject.In some embodiments, methods provided herein can be used to detect thepresence of cancer at an early stage. In some embodiments, methodsprovided herein for identifying the presence of cancer in a subject withhigh sensitivity and specificity are performed prior to havingdetermined that the subject already suffers from cancer, prior to havingdetermined that the subject harbors a cancer cell, and/or prior to thesubject exhibiting symptoms associated with cancer. In some embodiments,methods provided herein can be used to detect the presence of a geneticbiomarker, a protein biomarker, and/or aneuploidy, which geneticbiomarker, a protein biomarker, and/or aneuploidy is indicative that thesubject has cancer (e.g., harbors a cancer cell).

Methods provided herein can be used to deted any type of cancer. In somecases, a cancer can include one or more solid tumors. In some cases, acancer can be a blood cancer (e.g., can include hematological tumors).Cancer types that can be detected by any of the variety of methodsdescribed herein include, without limitation, acute lymphoblasticleukemia (ALL), acute myeloid leukemia (AML), adrenal cancer,adrenocortical carcinoma, AIDS-related cancers, AIDS-related lymphoma,amyotrophic lateral sclerosis or ALS, anal cancer, appendix cancer,astrocytoma, astrocytoma, childhood cerebellar or cerebral, atypicalteratoid/rhabdoid tumor, basal cell carcinoma, bile duct cancer, bileduct cancer, extrahepatic (see cholangiocarcinoma), bladder cancer, bonecancer, bone tumor, osteosarcoma/malignant fibrous histiocytoma, braincancer, glioblastoma, brain stem glioma, brain tumor, brain tumor,cerebellar astrocytoma, brain tumor, cerebral astrocytoma/malignantglioma, brain tumor, ependymoma, brain tumor, medulloblastoma, braintumor, supratentorial primitive neuroectodermal tumors, brain tumor,visual pathway and hypothalamic glioma, brainstem glioma, breast cancer,bronchial adenomas/carcinoids, bronchial tumor, bronchioles lung cellcarcinoma, Burkitt lymphoma, cancer in adolescents, carcinoid tumor,carcinoid tumor, childhood, carcinoid tumor, gastrointestinal, carcinomaof unknown primary, cardiac tumors, central nervous system lymphoma,primary, cerebellar astrocytoma, childhood, cerebralastrocytoma/malignant glioma, childhood, cervical cancer, childhoodcancers, chondrosarcoma, chordoma, chronic lymphocytic leukemia (CLL),chronic myelogenous leukemia (CIVIL), chronic myeloproliferativedisorders, chronic myeloproliferative neoplasms, colon cancer,colorectal cancer, colorectal cancer (e.g., metastatic colorectalcancer), craniopharyngioma, cutaneous t-cell lymphoma, desmoplasticsmall round cell tumor, differentiated thyroid cancer, ductal carcinomain situ, embryonal tumors, endometrial cancer, ependymoma, epithelioidhemangioendothelioma (EHE), esophageal cancer (e.g., esophagealadenocarcinoma or squamous cell carcinoma), esthesioneuroblastoma,Ewing's sarcoma in the Ewing family of tumors, extracranial germ celltumor, extracranial germ cell tumor, childhood, extragonadal germ celltumor, extrahepatic bile duct cancer, eye cancer, eye cancer,intraocular melanoma, eye cancer, retinoblastoma, fallopian tube cancer,fibrous histiocytoma of bone, gallbladder cancer, ganglioneuromatosis ofthe gastroenteric mucosa, gastric (stomach) cancer, gastric (stomach)cancer, gastric carcinoid, gastrointestinal carcinoid tumor,gastrointestinal stromal tumors (GIST), germ cell tumor, germ celltumor: extracranial, extragonadal, or ovarian, gestational trophoblasticdisease, gestational trophoblastic tumor, glioma, glioma of the brainstem, glioma, childhood cerebral astrocytoma, glioma, childhood visualpathway and hypothalamic, hairy cell leukemia, hairy cell tumor, headand neck cancer, heart cancer, hepatocellular (liver) cancer,histiocytosis, Hodgkin's lymphoma, hypopharyngeal cancer, hypothalamicand visual pathway glioma, childhood, inflammatory myofibroblastictumor, intraocular melanoma, intraocular melanoma, Islet cell carcinoma(endocrine pancreas), islet cell tumors, Kaposi sarcoma, kidney cancer(renal cell cancer), Langerhans cell histiocytosis, laryngeal cancer,leukaemia, acute lymphoblastic (also called acute lymphocyticleukaemia), leukaemia, acute myeloid (also called acute myelogenousleukemia), leukaemia, chronic lymphocytic (also called chroniclymphocytic leukemia), leukemia, leukemia, chronic myelogenous (alsocalled chronic myeloid leukemia), leukemia, hairy cell, lip and oralcavity cancer, liposarcoma, liver cancer (e.g., (e.g., hepatocellularcarcinoma or cholangiocarcinoma), lung adenocarcinoma, lung cancer(e.g., small cell lung carcinoma, non-small cell lung carcinoma,squamous cell lung cancer, or large cell lung cancer), lymphoma,lymphoma, AIDS-related, lymphoma, Burkitt, lymphoma, cutaneous T-Cell,lymphoma, Hodgkin, lymphoma, primary central nervous system, lymphomas,Non-Hodgkin (an old classification of all lymphomas except Hodgkin's),macroglobulinemia, male breast cancer, malignant fibrous histiocytoma ofbone, malignant fibrous histiocytoma of bone/osteosarcoma, medullarythyroid cancer, medulloblastoma, childhood, melanoma, melanoma,intraocular (eye), melanoma, intraocular (eye), Merkel cell cancer,Merkel cell carcinoma, adult malignant, mesothelioma (e.g., malignantpleural mesothelioma), childhood, metastatic squamous neck cancer,metastatic squamous neck cancer with occult primary, midline tractcarcinoma, mouth cancer, multiple endocrine neoplasia syndrome,childhood, multiple endocrine neoplasia syndromes, multiple endocrineneoplasia type 2A or 2B (MEN2A or MEN2B, respectively), multiplemyeloma, multiple myeloma/plasma cell neoplasm, mycosis fungoides,myelodysplastic syndromes, myelodysplastic/myeloproliferative diseases,myelodysplastic/myeloproliferative neoplasms, myelogenous leukemia,myelogenous leukemia, chronic, myeloid leukemia, myeloid leukemia, adultacute, myeloid leukemia, childhood acute, myeloma, multiple (cancer ofthe bone-marrow), myeloproliferative disorders, chronic,myeloproliferative neoplasms, myxoma, nasal cavity and paranasal sinuscancer, nasopharyngeal cancer, nasopharyngeal carcinoma, neuroblastoma,oligodendroglioma, oral cancer, oral cavity cancer, oropharyngealcancer, osteocarcinoma, osteosarcoma, osteosarcoma/malignant fibroushistiocytoma of bone, ovarian cancer, ovarian epithelial cancer (surfaceepithelial-stromal tumor), ovarian germ cell tumor, ovarian lowmalignant potential tumor, pancreatic cancer (e.g., pancreatic ductaladenocarcinoma), islet cell, pancreatic neuroendocrine tumors, papillaryrenal cell carcinoma, papillary thyroid cancer, papillomatosis,paraganglioma, paranasal sinus and nasal cavity cancer, parathyroidcancer, parathyroid hyperplasia, penile cancer, pharyngeal cancer,pheochromocytoma, Phyllodes breast tumors, pineal astrocytoma, pinealgerminoma, pineoblastoma and supratentorial primitive neuroectodermaltumors, childhood, pituitary adenoma, pituitary cancer, plasma cellneoplasia/multiple myeloma, plasma cell neoplasm, pleuropulmonaryblastoma, pregnancy and breast cancer, primary central nervous systemlymphoma, primary peritoneal cancer, prostate cancer, rectal cancer,recurrent thyroid cancer, refractory differentiated thyroid cancer,renal cell cancer, renal cell carcinoma (kidney cancer), renal pelvisand ureter, transitional cell cancer, retinoblastoma, rhabdomyosarcoma,rhabdomyosarcoma, childhood, salivary gland cancer, sarcoma, sarcoma,Ewing family of tumors, Sarcoma, Kaposi, Sezary syndrome, skin cancer,skin cancer (melanoma), skin cancer (non-melanoma), skin carcinoma,Merkel cell, small intestine cancer, soft tissue sarcoma, squamous cellcarcinoma, squamous cell carcinoma—see skin cancer (non-melanoma),squamous neck cancer, squamous neck cancer with occult primary,metastatic, stomach cancer, supratentorial primitive neuroectodermaltumor, childhood, T-cell lymphoma, T-cell lymphoma, cutaneous,testicular cancer, throat cancer, thymoma and thymic carcinoma, Thymoma,childhood, thyroid cancer, thyroid cancer, childhood, transitional cellcancer of the renal pelvis and ureter, trophoblastic tumor, gestational,unknown primary carcinoma, unknown primary site, cancer of, childhood,unknown primary site, carcinoma of, adult, ureter and renal pelvis,transitional cell cancer, urethral cancer, uterine cancer, uterinecancer, endometrial, uterine sarcoma, vaginal cancer, visual pathway andhypothalamic glioma, childhood, vulvar cancer, Waldenstrommacroglobulinemia, and Wilms tumor (kidney cancer).

In some embodiments, methods described herein are used to detect thepresence of a single type of cancer. In some embodiments, methodsdescribed herein are capable of detecting two or more (e.g., 2, 3, 4, 5,6, 7, 8, or more) types of cancer. For example, methods described hereincan be used to detect the presence of liver cancer, ovarian cancer,esophageal cancer, stomach cancer, pancreatic cancer, colorectal cancer,lung cancer, or breast cancer. As another example, methods describedherein can be capable of detecting the presence of each of liver cancer,ovarian cancer, esophageal cancer, stomach cancer, pancreatic cancer,colorectal cancer, lung cancer, and breast cancer (e.g., methodsdescribed herein are capable of detecting the presence of each of thesetypes of cancers in a subject, although only one type of cancer may bepresent in the subject). In some embodiments, various methods describedherein can be used to detect cancers selected from the group consistingof: pancreatic cancer, colon cancer, esophageal cancer, stomach cancer,ovarian cancer, liver cancer, lung cancer, and breast cancer, andcombinations thereof. As another example, methods described herein canbe used to detect the presence of cervical, endometrial, ovarian, orfallopian tubal cancers. As another example, methods described hereincan be capable of detecting the presence of each of cervical,endometrial, ovarian, and fallopian tubal cancers (e.g., methodsdescribed herein are capable of detecting the presence of each of thesetypes of cancers in a subject, although only one type of cancer may bepresent in the subject). As another example, methods described hereincan be used to detect the presence of bladder cancer or an upper-tracturothelial carcinoma (UTUC). As another example, methods describedherein can be capable of detecting the presence of each of bladdercancer and an upper-tract urothelial carcinoma (UTUC) (e.g., methodsdescribed herein are capable of detecting the presence of each of thesetypes of cancers in a subject, although only one type of cancer may bepresent in the subject).

Further Diagnostic Testing

In some embodiments of diagnosing or identifying the presence of adisease (e.g., cancer) in a subject (e.g., using any of the variety ofmethods described herein), the subject is also identified as a candidatefor further diagnostic testing. Provided herein are methods forselecting a subject for further diagnostic testing. In some embodiments,methods for selecting a subject for further diagnostic testing includedetecting the presence of one or more genetic biomarkers in a biologicalsample isolated from the subject, detecting the presence of one or moreprotein biomarkers in a biological sample isolated from the subject,and/or detecting the presence of aneuploidy in a biological sampleisolated from the subject and selecting a subject for further diagnostictesting when the presence of one or more genetic biomarkers, one or moreprotein biomarkers, or aneuploidy is identified. In some embodiments,methods for selecting a subject for further diagnostic testing furtherinclude detecting the presence of one or more member of one or moreother classes of biomarkers. In some embodiments, the step of detectingis performed prior to having determined that the subject already suffersfrom cancer (e.g., when the subject is not known to harbor a cancercell).

In some embodiments, the biological sample is isolated from a subject.Any suitable biological sample that contains one or more geneticbiomarkers, protein biomarkers, and/or aneuploidy can be used inaccordance with any of the variety of methods described herein. Forexample, the biological sample can include blood, plasma, urine,cerebrospinal fluid, saliva, sputum, broncho-alveolar lavage, bile,lymphatic fluid, cyst fluid, stool, ascites, and combinations thereof.Methods of isolating biological samples from a subject are known tothose of ordinary skill in the art.

In some embodiments, the subject may be selected for further diagnostictesting. In some embodiments, methods provided herein can be used toselect a subject for further diagnostic testing at a time period priorto the time period when conventional techniques are capable ofdiagnosing the subject with an early-stage cancer. For example, methodsprovided herein for selecting a subject for further diagnostic testingcan be used when a subject has not been diagnosed with cancer byconventional methods and/or when a subject is not known to harbor acancer. In some embodiments, a subject selected for further diagnostictesting can be administered a diagnostic test (e.g., any of thediagnostic tests described herein) at an increased frequency compared toa subject that has not been selected for further diagnostic testing. Forexample, a subject selected for further diagnostic testing can beadministered a diagnostic test at a frequency of twice daily, daily,bi-weekly, weekly, bi-monthly, monthly, quarterly, semi-annually,annually, or any at frequency therein. In some embodiments, a subjectselected for further diagnostic testing can be administered one or moreadditional diagnostic tests compared to a subject that has not beenselected for further diagnostic testing. For example, a subject selectedfor further diagnostic testing can be administered two diagnostic testsor more, whereas a subject that has not been selected for furtherdiagnostic testing is administered only a single diagnostic test (or nodiagnostic tests). In some embodiments, the diagnostic testing methodcan determine the presence of the same type of cancer as the originallydetected cancer. Additionally or alternatively, the diagnostic testingmethod can determine the presence of a different type of cancer from theoriginally detected cancer.

In some embodiments, the diagnostic testing method is a scan. In someembodiments, the scan is a bone scan, a computed tomography (CT), a CTangiography (CTA), an esophagram (a Barium swallow), a Barium enema, agallium scan, a magnetic resonance imaging (MM), a mammography, amonoclonal antibody scan (e.g., ProstaScint® scan for prostate cancer,OncoScint® scan for ovarian cancer, and CEA-Scan® for colon cancer), amultigated acquisition (MUGA) scan, a PET scan, a PET/CT scan, a thyroidscan, an ultrasound (e.g., a breast ultrasound, an endobronchialultrasound, an endoscopic ultrasound, a transvaginal ultrasound), anX-ray, a DEXA scan.

In some embodiments, the diagnostic testing method is a physicalexamination, such as, without limitation, an anoscopy, a biopsy, abronchoscopy (e.g., an autofluorescence bronchoscopy, a white-lightbronchoscopy, a navigational bronchoscopy), a digital breasttomosynthesis, a digital rectal exam, an endoscopy, including but notlimited to a capsule endoscopy, virtual endoscopy, an arthroscopy, abronchoscopy, a colonoscopy, a colposcopy, a cystoscopy, anesophagoscopy, a gastroscopy, a laparoscopy, a laryngoscopy, aneuroendoscopy, a proctoscopy, a sigmoidoscopy, a skin cancer exam, athoracoscopy, an endoscopic retrograde cholangiopancreatography (ERCP),an ensophagogastroduodenoscopy, a pelvic exam.

In some embodiments, the diagnostic testing method is a biopsy (e.g., abone marrow aspiration, a tissue biopsy). In some embodiments, thebiopsy is performed by fine needle aspiration or by surgical excision.In some embodiments, the diagnostic testing method(s) further includeobtaining a biological sample (e.g., a tissue sample, a urine sample, ablood sample, a check swab, a saliva sample, a mucosal sample (e.g.,sputum, bronchial secretion), a nipple aspirate, a secretion or anexcretion). In some embodiments, the diagnostic testing method(s)include determining exosomal proteins (e.g., an exosomal surface protein(e.g., CD24, CD147, PCA-3)) (Soung et al. (2017) Cancers 9(1):pii:E8).In some embodiments, the diagnostic testing method is an oncotype DX®test (Baehner (2016) Ecancermedicalscience 10:675).

In some embodiments, the diagnostic testing method is a test, such aswithout limitation, an alpha-fetoprotein blood test, a bone marrow test,a fecal occult blood test, a human papillomavirus test, low-dose helicalcomputed tomography, a lumbar puncture, a prostate specific antigen(PSA) test, a pap smear, or a tumor marker test.

In some embodiments, the diagnostic testing method includes determiningthe level of a known protein biomarker (e.g., CA-125 or prostatespecific antigen (PSA)). For example, a high amount of CA-125 can befound in subject's blood, which subject has ovarian cancer, endometrialcancer, fallopian tube cancer, pancreatic cancer, stomach cancer,esophageal cancer, colon cancer, liver cancer, breast cancer, or lungcancer. The term “biomarker” as used herein refers to “a biologicalmolecule found in blood, other bodily fluids, or tissues that is a signof a normal or abnormal process, or of a condition or disease”, e.g., asdefined by the National Cancer Institute. (see, e.g., the URLwww.cancer.gov/publications/dictionaries/cancer-terms?CdrID=45618). Abiomarker can include a genetic biomarker such as, without limitation, anucleic acid (e.g., a DNA molecule, a RNA molecule (e.g., a microRNA, along non-coding RNA (lncRNA) or other non-coding RNA) A biomarker caninclude a protein biomarker such as, without limitation, a peptide, aprotein, or a fragment thereof.

In some embodiments, the biomarker is FLT3, NPM1, CEBPA, PRAM1, ALK,BRAF, KRAS, EGFR, Kit, NRAS, JAK2, KRAS, HPV virus, ERBB2, BCR-ABL,BRCA1, BRCA2, CEA, AFP, and/or LDH. See e.g., Easton et al. (1995) Am.J. Hum. Genet. 56: 265-271, Hall et al. (1990) Science 250: 1684-1689,Lin et al. (2008) Ann. Intern. Med. 149: 192-199, Allegra et al. (2009)(2009) J. Clin. Oncol. 27: 2091-2096, Paik et al. (2004) N. Engl. J.Med. 351: 2817-2826, Bang et al. (2010) Lancet 376: 687-697,Piccart-Gebhart et al. (2005) N. Engl. J. Med. 353: 1659-1672, Romond etal. (2005) N. Engl. J. Med. 353: 1673-1684, Locker et al. (2006) J.Clin. Oncol. 24: 5313-5327, Giligan et al. (2010) J. Clin. Oncol. 28:3388-3404, Harris et al. (2007) J. Clin. Oncol. 25: 5287-5312; Henry andHayes (2012) Mol. Oncol. 6: 140-146. In some embodiments, the biomarkeris a biomarker for detection of breast cancer in a subject, such as,without limitation, MUC-1, CEA, p53, urokinase plasminogen activator,BRCA1, BRCA2, and/or HER2 (Gam (2012) World J. Exp. Med. 2(5): 86-91).In some embodiments, the biomarker is a biomarker for detection of lungcancer in a subject, such as, without limitation, KRAS, EGFR, ALK, MET,and/or ROS1 (Mao (2002) Oncogene 21: 6960-6969; Korpanty et al. (2014)Front Oncol. 4: 204). In some embodiments, the biomarker is a biomarkerfor detection of ovarian cancer in a subject, such as, withoutlimitation, HPV, CA-125, HE4, CEA, VCAM-1, KLK6/7, GST1, PRSS8, FOLR1,ALDH1 (Nolen and Lokshin (2012) Future Oncol. 8(1): 55-71; Sarojini etal. (2012) J. Oncol. 2012:709049). In some embodiments, the biomarker isa biomarker for detection of colorectal cancer in a subject, such as,without limitation, MLH1, MSH2, MSH6, PMS2, KRAS, and BRAF(Gonzalez-Pons and Cruz-Correa (2015) Biomed. Res. Int. 2015: 149014;Alvarez-Chaver et al. (2014) World J. Gastroenterol. 20(14): 3804-3824).In some embodiments, the diagnostic testing method determines thepresence and/or expression level of a nucleic acid (e.g., microRNA(Sethi et al. (2011) J. Carcinog. Mutag. S1-005), RNA, a SNP (Hosein etal. (2013) Lab. Invest doi: 10.1038/labinvest.2013.54; Falzoi et al.(2010) Pharmacogenomics 11: 559-571), methylation status (Castelo-Brancoet al. (2013) Lancet Oncol 14: 534-542), a hotspot cancer mutation(Yousem et al. (2013) Chest 143: 1679-1684)). Non-limiting examples ofmethods of detecting a nucleic acid in a sample include: PCR, RT-PCR,sequencing (e.g., next generation sequencing methods, deep sequencing),a DNA microarray, a microRNA microarray, a SNP microarray, fluorescentin situ hybridization (FISH), restriction fragment length polymorphism(RFLP), gel electrophoresis, Northern blot analysis, Southern blotanalysis, chromogenic in situ hybridization (CISH), chromatinimmunoprecipitation (ChIP), SNP genotyping, and DNA methylation assay.See, e.g., Meldrum et al. (2011) Clin. Biochem. Rev. 32(4): 177-195;Sidranksy (1997) Science 278(5340): 1054-9.

In some embodiments, the diagnostic testing method includes determiningthe presence of a protein biomarker in a sample (e.g., a plasmabiomarker (Minis et al. (2015) Clin. Cancer Res. 21(7): 1764-1771)).Non-limiting examples of methods of determining the presence of aprotein biomarker include: western blot analysis, immunohistochemistry(IHC), immunofluorescence, mass spectrometry (MS) (e.g., matrix assistedlaser desorption/ionization (MALDI)-MS, surface enhanced laserdesorption/ionization time-of-flight (SELDI-TOF)-MS), enzyme-linkedimmunosorbent assay (ELISA), flow cytometry, proximity assay (e.g.,VeraTag proximity assay (Shi et al. (2009) Diagnostic molecularpathology: the American journal of surgical pathology, part B: 18:11-21, Huang et al. (2010) AM. J. Clin. Pathol. 134: 303-11)), a proteinmicroarray (e.g., an antibody microarray (Ingvarsson et al. (2008)Proteomics 8: 2211-9, Woodbury et al. (2002) J. Proteome Res. 1:233-237), an IHC-based microarray (Stromberg et al. (2007) Proteomics 7:2142-50), a microarray ELISA (Schroder et al. (2010) Mol. Cell.Proteomics 9: 1271-80). In some embodiments, the method of determiningthe presence of a protein biomarker is a functional assay. In someembodiments, the functional assay is a kinase assay (Ghosh et al. (2010)Biosensors & Bioelectronics 26: 424-31, Mizutani et al. (2010) Clin.Cancer Res. 16: 3964-75, Lee et al. (2012) Biomed. Microdevices 14:247-57), a protease assay (Lowe et al. (2012) ACS nano. 6: 851-7,Fujiwara et al. (2006) Breast cancer 13: 272-8, Darragh et al. (2010)Cancer Res 70: 1505-12). See, e.g., Powers and Palecek (2015) J. HeathcEng. 3(4): 503-534, for a review of protein analytical assays fordiagnosing cancer patients.

In some embodiments, the diagnostic testing method includes detectingthe presence of aneuploidy in a biological sample (e.g. detectingwhether the biological sample contains cells with an abnormal number ofchromosomes). Non-limiting examples of methods of detecting the presenceof aneuploidy include karyotyping, digital karyotyping, fluorescence insitu hybridization (FISH), quantitative PCR of short tandem repeats,quantitative fluorescence PCR (QF-PCR), quantitative PCR dosageanalysis, quantitative mass spectrometry of single nucleotidepolymorphisms, and comparative genomic hybridization (CGH).

In some embodiments, a subject that has been selected for furtherdiagnostic testing can also be selected for increased monitoring. Oncethe presence of a cancer cell has been identified (e.g., by any of thevariety of methods described herein), it may be beneficial for thesubject to undergo both increased monitoring (e.g., to assess theprogression of the tumor or cancer in the subject and/or to assess thedevelopment of additional cancer cell mutations), and further diagnostictesting (e.g., to determine the size and/or exact location of the tumorharboring the cancer cell).

In some embodiments, a subject that is selected for further diagnostictesting can also be selected for a therapeutic intervention. Any of thetherapeutic interventions described herein or known in the art can beadministered. For example, a subject that has been selected for furtherdiagnostic testing can be administered a further diagnostic test, and atherapeutic intervention can be administered if the presence of thecancer cell is confirmed. Additionally or alternatively, a subject thathas been selected for further diagnostic testing can be administered atherapeutic intervention, and can be further monitored as thetherapeutic intervention progresses. In some embodiments, after asubject that has been selected for further diagnostic testing has beenadministered a therapeutic intervention, the additional testing willreveal the presence of one or more additional genetic biomarkers, thepresence of one or more additional protein biomarkers, and/or thepresence of aneuploidy. In some embodiments, the presence of one or moreadditional genetic biomarkers, the presence of one or more additionalprotein biomarkers, and/or the presence of aneuploidy will provide causeto administer a different therapeutic intervention (e.g., a resistancemutation may arise in a cancer cell during the therapeutic intervention,which cancer cell harboring the resistance mutation is resistance to theoriginal therapeutic intervention).

Increased Monitoring

Also provided herein are methods for selecting a subject for increasedmonitoring. In some embodiments, methods for selecting a subject forincreased monitoring include detecting the presence of one or moregenetic biomarkers in a biological sample isolated from the subject,detecting the presence of one or more protein biomarkers in a biologicalsample isolated from the subject, and/or detecting the presence ofaneuploidy in a biological sample isolated from the subject, andselecting a subject for increased monitoring when the presence of one ormore genetic biomarkers, one or more protein biomarkers, or aneuploidyis identified. In some embodiments, methods for selecting a subject forincreased monitoring further include detecting the presence of one ormore member of one or more other classes of biomarkers. In someembodiments, the step of detecting is performed when the subject is notknown to harbor a cancer cell (e.g., when the subject is not known toharbor a cancer cell).

In some embodiments, the biological sample is isolated from a subject.Any suitable biological sample that contains one or more geneticbiomarkers, protein biomarkers, and/or aneuploidy can be used inaccordance with any of the variety of methods disclosed herein. Forexample, the biological sample can include blood, plasma, urine,cerebrospinal fluid, saliva, sputum, broncho-alveolar lavage, bile,lymphatic fluid, cyst fluid, stool, ascites, and combinations thereof.Methods of isolating biological samples from a subject are known tothose of ordinary skill in the art.

In some embodiments, once a subject has been determined to have acancer, the subject may be selected for increased or additionalmonitoring. In some embodiments, methods provided herein can be used toselect a subject for increased monitoring at a time period prior to thetime period when conventional techniques are capable of diagnosing thesubject with an early-stage cancer. For example, methods provided hereinfor selecting a subject for increased monitoring can be used when asubject has not been diagnosed with cancer by conventional methodsand/or when a subject is not known to harbor a cancer. In someembodiments, a subject selected for increased monitoring can beadministered a diagnostic test (e.g., any of the diagnostic testsdisclosed herein) at an increased frequency compared to a subject thathas not been selected for increased monitoring. For example, a subjectselected for increased monitoring can be administered a diagnostic testat a frequency of twice daily, daily, bi-weekly, weekly, bi-monthly,monthly, quarterly, semi-annually, annually, or any at frequencytherein. In some embodiments, a subject selected for increasedmonitoring can be administered one or more additional diagnostic testscompared to a subject that has not been selected for increasedmonitoring. For example, a subject selected for increased monitoring canbe administered two diagnostic tests, whereas a subject that has notbeen selected for increased monitoring is administered only a singlediagnostic test (or no diagnostic tests).

In some embodiments, a subject that has been selected for increasedmonitoring can also be selected for further diagnostic testing. Once thepresence of a cancer cell has been identified (e.g., by any of thevariety of methods described herein), it may be beneficial for thesubject to undergo both increased monitoring (e.g., to assess theprogression of the tumor or cancer in the subject and/or to assess thedevelopment of additional cancer cell mutations), and further diagnostictesting (e.g., to determine the size and/or exact location of the tumorharboring the cancer cell).

In some embodiments, a subject that is selected for increased monitoringcan also be selected for a therapeutic intervention. Any of thetherapeutic interventions described herein or known in the art can beadministered. For example, a subject that has been selected forincreased monitoring can be further monitored, and a therapeuticintervention can be administered if the presence of the cancer cell ismaintained throughout the increased monitoring period. Additionally oralternatively, a subject that has been selected for increased monitoringcan be administered a therapeutic intervention, and further monitored asthe therapeutic intervention progresses. In some embodiments, after asubject that has been selected for increased monitoring has beenadministered a therapeutic intervention, the increased monitoring willreveal the presence of one or more additional genetic biomarkers, thepresence of one or more additional protein biomarkers, and/or thepresence of aneuploidy. In some embodiments, the presence of one or moreadditional genetic biomarkers, the presence of one or more additionalprotein biomarkers, and/or the presence of aneuploidy will provide causeto administer a different therapeutic intervention (e.g., a resistancemutation may arise in a cancer cell during the therapeutic intervention,which cancer cell harboring the resistance mutation is resistance to theoriginal therapeutic intervention).

Therapeutic Interventions

In some embodiments, once a subject has been determined to have a cancer(e.g., ancercical, endometrial, ovarian, or fallopian tubal cancer), oris suspected of having cancer, the subject may be administered atherapeutic intervention or selected for therapeutic intervention. Insome embodiments, wherein the presence of cancer (e.g., a cervical,endometrial, ovarian, or fallopian tubal cancer) been detected in asubject, the subject is administered a therapeutic intervention thatspecifically targets the subject's cancer (e.g. genetic modificationspresent in the cervical, endometrial, ovarian, or fallopian tubalcancer). For example, when a subject is determined to have ovariancancer, a therapeutic intervention appropriate for ovarian cancer can beadministered. As another example, when a subject is determined to haveendometrial cancer, a therapeutic intervention appropriate forendometrial cancer can be administered. In some embodiments thetherapeutic intervention is chemotherapy (e.g., any of theplatinum-based chemotherapeutic agents described herein (e.g.,cisplatin, carboplatin), or a taxane (e.g., placitaxel (Taxol®) ordocetaxel (Taxotere®). In some embodiments, the chemotherapeutic agentis an albumin-bound paclitaxel (nap-paclitaxel, Abraxane®), altretamine(Hexalen®), capecitabine (Xeloda®), cyclophosphamide (Cytoxan®),etoposide(VP-16), gemcitabine (Gemzar®), ifosfamide (Ifex®), irinotecan(CPT-11, Camptosar®), liposomal doxorubicin (Doxil®), melphalan,pemetrexed (Alimta®), topotecan, or vinorelbine (Navelbine®). In someembodiments, the therapeutic intervention is a combination ofchemotherapeutic agents (e.g., paclitaxel, ifosfamide, and cisplatin;vinblastine, ifosfamide and cisplatin; etoposide, ifosfamide andcisplatin). In some embodiments, the therapeutic intervention is anepigenetic therapy (see, e.g., Smith et al. (2017) Gynecol. Oncol. Rep.20: 81-86). In some embodiments, the epigenetic therapy is a DNAmethyltransferase (DNMT) inhibitor (e.g., 5-azacytidine (5-AZA),decitabine (5-aza-2′-deoxycytidine) (Fu et al. (2011) Cancer 117(8):1661-1669; Falchook et al. (2013) Investig. New Drugs 31(5): 1192-1200;Matei et al. (2012) Cancer Res. 72(9): 2197-2205). In some embodiments,the DNMT1 inhibitor is NY-ESO-1 (Odunsi et al. (2014) Cancer Immunol.Res. 2(1): 37-49). In some embodiments, the epigenetic therapy is ahistone deacetylase (HDAC) inhibitor. In some embodiments, the HDACinhibitor is vorinostat (Modesitt (2008) 109(2): 182-186) or belinostat(Mackay et al. (2010) Eur. J. Cancer 46(9): 1573-1579). In someembodiments, the HDAC inhibitor is given in combination with achemotherapeutic agent (e.g., carboplatin (paraplatin), cisplatin,paclitaxel or docetaxel (taxotere)) (Mendivil (2013) Int. J. Gynecol.Cancer 23(3): 533-539; Dizon (2012) Gynecol. Oncol. 125(2): 367-371;Dizon (2012) Int J. Gynecol. Cancer 23(3): 533-539). In someembodiments, the therapeutic intervention is an anti-angiogenic agent(e.g., bevacizumab). In some embodiments, the therapeutic interventionis a poly (ADP-ribose) polymerase (PARP)-1 and/or PARP-2 inhibitor. Insome embodiments, the PARP-1 and PARP-2 inhibitor is niraparib (zejula)(Scott (2017) Drugs doiL10.1007/s40265-017-0752). In some embodiments,the PARP inhibitor is olaparib (lynparza) or rucaparib (rubraca). Insome embodiments, the therapeutic intervention is a hormone (e.g., aluteinizing-hormone-releasing hormone (LHRH) agonist). In someembodiments, the LHRH agonist is goserelin (Zoladex®) or leuprolide(Lupron®). In some embodiments, the therapeutic intervention is ananti-estrogen compound (e.g., tamoxifen). In some embodiments, thetherapeutic intervention is an aromatase inhibitor (e.g., letrozole(Femara®), anastrozole (Arimidex®) or exemestane (Aromasin®). In someembodiments, the therapeutic intervention is surgery (e.g., debulking ofthe tumor mass, a hysterectomy, a bilateral salpingo-oophorectomy, anomentectomy). The term “debulking” refers to surgical removal of almostthe entire tumor (“optimally debulked”). In some embodiments, debulkingcan include removing a portion of the bladder, the spleen, thegallbladder, the stomach, the liver, and/or pancreas. In someembodiments, adjuvant chemotherapy is further administered to thesubject after surgery (e.g., debulking of the tumor mass, ahysterectomy, a bilateral salpingo-oophorectomy, an omentectomy). Insome embodiments, adjuvant chemotherapy is administeredintra-abdominally (intraperitoneally). In some embodiments, thetherapeutic intervention is a prophylactic surgery (e.g., ahysterectomy). In some embodiments, a paracentesis is performed toremove ascites.

In some embodiments, once a subject has been determined to have a cancer(e.g., a bladder cancer or an UTUC) according to any of the variety ofmethods provided herein, the subject may be administered a therapeuticintervention or selected for therapeutic intervention. For example, whena subject is determined to have bladder cancer, a therapeuticintervention appropriate for bladder cancer can be administered.Examples of such therapeutic interventions that are appropriate forbladder cancer include, without limitation, transuretral resection ofthe bladder (TURB), intravesical BCG (Bacillus Calmette-Guerin),intravesical chemotherapy, adjuvant chemotherapy, neoadjuvantchemotherapy, cystectomy or cystoprostatectomy, radiation therapy,immunotherapy, immune checkpoint inhibitors, or any combination of theabove. As another example, when a subject is determined to have an UTUC,a therapeutic intervention appropriate for an UTUC can be administered.Examples of such therapeutic interventions that are appropriate for anUTUC include, without limitation, transurethral resection, intravesicalBCG (Bacillus Calmette-Guerin), intravesical chemotherapy, adjuvantchemotherapy, neoadjuvant chemotherapy, ureterectomy ornephroureterectomy, radiation therapy, immunotherapy, immune checkpointinhibitors, or any combination of the above.

In some embodiments, the detected cancer is a low-grade tumor (e.g., aneoplasm of low malignant potential (PUNLMP) or a non-invasive low gradepapillary urothelial carcinoma). In some embodiments, once a subject hasbeen determined to have a low-grade tumor, the subject may beadministered a therapeutic intervention or selected for therapeuticintervention that includes transuretral resection of the bladder (TURB).

In some embodiments, wherein the presence of colorectal cancer beendetected in a subject, the subject is administered a therapeuticintervention that specifically targets the subject's colorectal cancer(e.g. genetic modifications present in the colorectal cancer). In someembodiments, the subject is administered an anti-EGFR monoclonalantibody (e.g., cetuximab or panitumumab) (Cunningham et al. (2004) N.Engl. J. Med. 351(4): 337-345). In some embodiments, the therapeuticinvention is an antiangiogenic agent. In some embodiments, theantiangiogenic agent is bevacizumab (Avastin) (Hurwitz et al. (2004) N.Engl. J. Med. 350: 2335-2342). In some embodiments, the antiangiogenicagent is a VEGF inhibitor (e.g., aflibercept (Tang et al. (2008) J.Clin. Oncol 26 (May 20 suppl; abstr 4027); vatalanib (PTK/ZK222584;Hecht et al. (2005) ASCO Annual Meeting Proceedings J. Clin. Oncol. 23:16S (abstr. LBA3)); sunitinib (Saltz et al. (2007) J. Clin. Oncol. 25:4793-4799); AZD2171 (Rosen et al. (2007) J. Clin. Oncol. 25: 2369-76);AMG 706 (Drevis et al. (2007) 25: 3045-2054)). In some embodiments,bevacizumb is administered with a chemotherapy treatment (see, e.g.,Hurwitz et al. (2004) N. Engl. J. Med. 350: 2335-2342; Gruenberger etal. (2008) J. Clin. Oncol. 26: 1830-1835). Non-limiting examples ofchemotherapy treatments that can be used in a subject with colorectalcancer include: 5-FU, leucovorin, oxaliplatin (Eloxatin), capecitabine,celecoxib and sulindac. In some embodiments, a combination ofchemotherapeutic agents is used, e.g., FOLFOX (5-FU, leucovorin andoxaliplatin), FOLFIRI (leucovorin, 5-FU and irinotecan (Camptosar),CapeOx (capecitabine (Xeloda) and oxaliplatin). In some embodiments, thetherapeutic intervention is a mammalian target of rapamycin (mTOR)inhibitor (e.g., a rapamycin analog (Kesmodel et al. (2007)Gastrointestinal Cancers Symposium (abstr 234)); RAD-001 (Tabernero etal. (2008) J. Clin. Oncol. 26: 1603-1610). In some embodiments, thetherapeutic intervention is a protein kinase C antagonist (e.g.,enzastaurin (Camidge et al. (2008) Anticancer Drugs 19:77-84, Resta etal. (2008) J. Clin. Oncol. 26 (May 20 suppl) (abstr 3529)). In someembodiments, the therapeutic intervention is an inhibitor of nonreceptortyrosine kinase Src (e.g., AZ0530 (Tabernero et al. (2007) J. Clin.Oncol. 25: 18S (abstr 3520))). In some embodiments, the therapeuticintervention is an inhibitor of kinesin spindle protein (KSP) (e.g.,ispinesib (SB-715992) (Chu et al. (2004) J. Clin. Oncol. 22:14S (abstr2078), Burris et al. (2004) J. Clin. Oncol. 22: 128 (abstr 2004))).

In some embodiments, wherein the presence of lung cancer been detectedin a subject, the subject is administered a therapeutic interventionthat specifically targets the subject's lung cancer (e.g. geneticmodifications present in the lung cancer). In some embodiments thetherapeutic intervention is chemotherapy (e.g., any of theplatinum-based chemotherapeutic agents described herein (e.g.,cisplatin, carboplatin), or a taxane (e.g., placitaxel (Taxol®) ordocetaxel (Taxotere®). In some embodiments, the chemotherapeutic agentis an albumin-bound paclitaxel (nap-paclitaxel, Abraxane®), altretamine(Hexalen®), capecitabine (Xeloda®), cyclophosphamide (Cytoxan®),etoposide(VP-16), gemcitabine (Gemzar®), ifosfamide (Ifex®), irinotecan(CPT-11, Camptosar®), liposomal doxorubicin (Doxil®), melphalan,pemetrexed (Alimta®), topotecan, or vinorelbine (Navelbine®). In someembodiments, the therapeutic intervention is a combination ofchemotherapeutic agents (e.g., paclitaxel, ifosfamide, and cisplatin;vinblastine, ifosfamide and cisplatin; etoposide, ifosfamide andcisplatin). In some embodiments, the therapeutic intervention is anepigenetic therapy (see, e.g., Smith et al. (2017) Gynecol. Oncol. Rep.20: 81-86). In some embodiments, the epigenetic therapy is a DNAmethyltransferase (DNMT) inhibitor (e.g., 5-azacytidine (5-AZA),decitabine (5-aza-2′-deoxycytidine) (Fu et al. (2011) Cancer 117(8):1661-1669; Falchook et al. (2013) Investig. New Drugs 31(5): 1192-1200;Matei et al. (2012) Cancer Res. 72(9): 2197-2205). In some embodiments,the DNMT1 inhibitor is NY-ESO-1 (Odunsi et al. (2014) Cancer Immunol.Res. 2(1): 37-49). In some embodiments, the epigenetic therapy is ahistone deacetylase (HDAC) inhibitor. In some embodiments, the HDACinhibitor is vorinostat (Modesitt (2008) 109(2): 182-186) or belinostat(Mackay et al. (2010) Eur. J. Cancer 46(9): 1573-1579). In someembodiments, the HDAC inhibitor is given in combination with achemotherapeutic agent (e.g., carboplatin (paraplatin), cisplatin,paclitaxel or docetaxel (taxotere)) (Mendivil (2013) Int. J. Gynecol.Cancer 23(3): 533-539; Dizon (2012) Gynecol. Oncol. 125(2): 367-371;Dizon (2012) Int J. Gynecol. Cancer 23(3): 533-539). In someembodiments, the therapeutic intervention is an anti-angiogenic agent(e.g., bevacizumab). In some embodiments, the therapeutic interventionis a poly (ADP-ribose) polymerase (PARP)-1 and/or PARP-2 inhibitor. Insome embodiments, the PARP-1 and PARP-2 inhibitor is niraparib (zejula)(Scott (2017) Drugs doiL10.1007/s40265-017-0752). In some embodiments,the PARP inhibitor is olaparib (lynparza) or rucaparib (rubraca). Insome embodiments, the therapeutic intervention is a hormone (e.g., aluteinizing-hormone-releasing hormone (LHRH) agonist). In someembodiments, the LHRH agonist is goserelin (Zoladex®) or leuprolide(Lupron®). In some embodiments, the therapeutic intervention is ananti-estrogen compound (e.g., tamoxifen). In some embodiments, thetherapeutic intervention is an aromatase inhibitor (e.g., letrozole(Femara®), anastrozole (Arimidex®) or exemestane (Aromasin®). In someembodiments, the therapeutic intervention is surgery (e.g., debulking ofthe tumor mass, a hysterectomy, a bilateral salpingo-oophorectomy, anomentectomy). The term “debulking” refers to surgical removal of almostthe entire tumor (“optimally debulked”). In some embodiments, debulkingcan include removing a portion of the bladder, the spleen, thegallbladder, the stomach, the liver, and/or pancreas. In someembodiments, adjuvant chemotherapy is further administered to thesubject after surgery (e.g., debulking of the tumor mass, ahysterectomy, a bilateral salpingo-oophorectomy, an omentectomy). Insome embodiments, adjuvant chemotherapy is administeredintra-abdominally (intraperitoneally). In some embodiments, thetherapeutic intervention is a prophylactic surgery (e.g., ahysterectomy). In some embodiments, a paracentesis is performed toremove ascites.

In some embodiments, wherein the presence of breast cancer been detectedin a subject, the subject is administered a therapeutic interventionthat specifically targets the subject's breast cancer (e.g. geneticmodifications present in the breast cancer). In some embodiments, thetargeted drug therapy is a HER2 inhibitor (e.g., trastuzumab(Herceptin), pertuzumab (perjeta); ado-trastuzumab emtansine (T-DM1;Kadcyla); lapatinib (Tykerb), neratinib). See, e.g., Baselga et al.(2012) N Engl J Med 366: 109-119; Konecny et al. (2006) Cancer Res 66:1630-1639, Xia et al. (2007) Cancer Res. 67: 1170-1175; Gomez et al.(2008) J Clin Oncol 26: 2999-30005; Wong et al. (2009) Clin. Cancer Res.15: 2552-2558; Agus et al. (2002) Cancer Cell 2: 127-137; Lewis Philipset al. (2008) Cancer Res 68: 9280-9290. In some embodiments, thetargeted drug therapy is a cyclin-dependent kinase inhibitor (e.g., aCDK4/6 inhibitor (e.g., palbociclib (Ibrance®), ribociclin(Kisqali®),abemaciclib) (Turner et al. (2015) N Engl J Med 373: 209-219; Finn etal. (2016) N Eng J Med 375: 1925-1936; Ehab and Elbaz (2016) BreastCancer 8: 83-91; Xu et al. (2017) J Hematol. Oncol. 10(1): 97; Corona etal. (2017) Cri Rev Oncol Hematol 112: 208-214; Barroso-Sousa et al.(2016) Breast Care 11(3): 167-173)). In some embodiments, the targeteddrug therapy is a PARP inhibitor (e.g., olaparib (AZD2281), veliparib(ABT-888), niraparib (MK-4827), talazoparib (BMN-673), rucaparib(AG-14699), CEP-9722) See, e.g., Audeh et al. (2010) Lancet 376:245-251; Fong et al. (2009) N Engl J Med 361: 123-134; Livrahi andGarber (2015) BMC Medicine 13: 188; Kaufamn et al. (2015) J Clin. Oncol.33: 244-250; Gelmon et al. (2011) Lancet Oncol. 12: 852-61; Isakoff etal. (2011) Cancer Res 71:P3-16-05; Sandhu et al. (2013) Lancet Oncol14:882-92; Tutt et al. (2010) Lancet 376: 235-44; Somlo et al. (2013) J.Clin. Oncol. 31: 1024; Shen et al. (2013) CLin. Cancer Res. 19(18):5003-15; Awada et al. (2016) Anticancer Drugs 27(4): 342-8. In someembodiments, the targeted drug therapy is a mTOR inhibitor (e.g.,everolimus (afinitor)). See, e.g., Gong et al. (2017) Oncotarget doi:10.18632/oncotarget.16336; Louseberg et al. (2017) Breast Cancer 10:239-252; Hare and Harvey (2017) Am J Cancer Res 7(3): 383-404. In someembodiments, the targeted drug therapy is a heat shock protein 90inhibitor (e.g., tanespimycin) (Modi et al. (2008) J. Clin Oncol. 26:s1027; Miller et al. (2007) J. Clin. Oncol. 25:s1115; Schulz et al.(2012) J Exp Med 209(2): 275-89). In some embodiments, the targeted drugtherapy further includes a bone-modifying drug (e.g., a bisphosphonateor denosumab (Xgeva)). See, e.g., Ethier et al. (2017) Curr Oncol Rep19(3): 15; Abdel-Rahman (2016) Expert Rev Anticancer Ther 16(8): 885-91.In some embodiments, the therapeutic intervention is a hormone (e.g., aluteinizing-hormone-releasing hormone (LHRH) agonist). In someembodiments, the LHRH agonist is goserelin (Zoladex®) or leuprolide(Lupron®). In some embodiments, the therapeutic intervention is ananti-estrogen compound (e.g., tamoxifen, fulvestrant (faslodex)). Insome embodiments, the therapeutic intervention is an aromatase inhibitor(e.g., letrozole (Femara®), anastrozole (Arimidex®) or exemestane(Aromasin®). In some embodiments, the therapeutic intervention issurgery (e.g., a lumpectomy, a single mastectomy, a double mastectomy, atotal mastectomy, a modified radical mastectomy, a sentinel lymph nodebiopsy, an axillary lymph node dissection; breast-conserving surgery).The extent of surgical removal will depend on the stage of breast cancerand overall prognosis. In some embodiments, the therapeutic interventionis radiation therapy. In some embodiments, the radiation therapy ispartial breast irradiation or intensity-modulated radiation therapy. Insome embodiments, the therapeutic intervention is chemotherapy (e.g.,capecitabine (xeloda), carboplatin (paraplatin), cisplatin (platinol),cyclophosphamide (neosar), docetaxel (docefrez, taxotere), doxorubicin(Adriamycin), pegylated liposomal doxorubicin (doxil), epirubicin(ellence), fluorouracil (5-FU, adrucil), gemcitabine (gemzar),methotrexate, paclitaxel (taxol), protein-bound paclitaxel (abraxane),vinorelbine (navelbine), eribulin (halaven), or ixabepilone (ixempra)).In some embodiments, the therapeutic intervention is a combination of atleast two chemotherapeutic agents (e.g., doxorubicin andcyclophosphamide (AC); epirubicin and cyclophosphamide (EC);cyclophosphamide, doxorubicin and 5-FU (CAF); cyclophosphamide,epirubicin and 5-FU (CEF); cyclophosphamide, methotrexate and 5-FU(CMF); epirubicin and cyclophosphamide (EC); docetaxel, doxorubicin andcyclophosphamide (TAC); docetaxel and cyclophosphamide (TC).

In some embodiments, a therapeutic intervention is administered to thesubject after a cancer is detected or identified. Any of the therapeuticinterventions disclosed herein or known in the art can be administered.Exemplary therapeutic interventions include, without limitation, akinase inhibitor, an immune checkpoint inhibitor (e.g., a PD-1, a PD-L1,and/or a CTLA-4 immune checkpoint inhibitor), a chemotherapeutic agent,adoptive T cell therapy (e.g., chimeric antigen receptors and/or T cellshaving wild-type or modified T cell receptors), an antibody, abispecific antibody or fragments thereof (e.g., BiTEs), chemotherapy,adjuvant chemotherapy, neoadjuvant chemotherapy, cytotoxic therapy,hormone therapy, immunotherapy, a monoclonal antibody, radiationtherapy, signal transduction inhibitors, surgery (e.g., surgicalresection), a targeted therapy such as administration of kinaseinhibitors (e.g., kinase inhibitors that target a particular geneticlesion, such as a translocation or mutation), or any combination ofthereof. Such therapeutic interventions can be administered alone or incombination. In some embodiments of any of the methods described herein,the one or more therapeutic interventions are administered sequentiallyor simultaneously to the subject after the cancer cell has beendetected. In some embodiments, the therapeutic intervention can beadministered at a time when the subject has an early-stage cancer, andwherein the therapeutic intervention is more effective that if thetherapeutic intervention were to be administered to a subject at a latertime. In some embodiments, a therapeutic intervention can reduce theseverity of the cancer, reduce a symptom of the cancer, and/or to reducethe number of cancer cells present within the subject.

In some embodiments, the therapeutic intervention can include an immunecheckpoint inhibitor. Non-limiting examples of immune checkpointinhibitors include nivolumab (Opdivo), pembrolizumab (Keytruda),atezolizumab (tecentriq), avelumab (bavencio), durvalumab (imfinzi),ipilimumab (yervoy). See, e.g., Pardoll (2012) Nat. Rev Cancer 12:252-264; Sun et al. (2017) Eur Rev Med Pharmacol Sci 21(6): 1198-1205;Hamanishi et al. (2015) J. Clin. Oncol. 33(34): 4015-22; Brahmer et al.(2012) N Engl J Med 366(26): 2455-65; Ricciuti et al. (2017) J. ThoracOncol. 12(5): e51-e55; Ellis et al. (2017) Clin Lung Cancer pii:S1525-7304(17)30043-8; Zou and Awad (2017) Ann Oncol 28(4): 685-687;Sorscher (2017) N Engl J Med 376(10: 996-7; Hui et al. (2017) Ann Oncol28(4): 874-881; Vansteenkiste et al. (2017) Expert Opin Biol Ther 17(6):781-789; Hellmann et al. (2017) Lancet Oncol. 18(1): 31-41; Chen (2017)J. Chin Med Assoc 80(1): 7-14.

In some embodiments, the therapeutic intervention is adoptive T celltherapy (e.g., chimeric antigen receptors and/or T cells havingwild-type or modified T cell receptors). See, e.g., Rosenberg andRestifo (2015) Science 348(6230): 62-68; Chang and Chen (2017) TrendsMol Med 23(5): 430-450; Yee and Lizee (2016) Cancer J. 23(2): 144-148;Chen et al. (2016) Oncoimmunology 6(2): e1273302; US 2016/0194404; US2014/0050788; US 2014/0271635; U.S. Pat. No. 9,233,125; incorporated byreference in their entirety herein.

In some embodiments, a therapeutic intervention is a chemotherapeuticagent. Non-limiting examples of chemotherapeutic agents include:amsacrine, azacitidine, axathioprine, bevacizumab (or an antigen-bindingfragment thereof), bleomycin, busulfan, carboplatin, capecitabine,chlorambucil, cisplatin, cyclophosphamide, cytarabine, dacarbazine,daunorubicin, docetaxel, doxifluridine, doxorubicin, epirubicin,erlotinib hydrochlorides, etoposide, fiudarabine, floxuridine,fludarabine, fluorouracil, gemcitabine, hydroxyurea, idarubicin,ifosfamide, irinotecan, lomustine, mechlorethamine, melphalan,mercaptopurine, methotrxate, mitomycin, mitoxantrone, oxaliplatin,paclitaxel, pemetrexed, procarbazine, all-trans retinoic acid,streptozocin, tafluposide, temozolomide, teniposide, tioguanine,topotecan, uramustine, valrubicin, vinblastine, vincristine, vindesine,vinorelbine, and combinations thereof. Additional examples ofanti-cancer therapies are known in the art; see, e.g. the guidelines fortherapy from the American Society of Clinical Oncology (ASCO), EuropeanSociety for Medical Oncology (ESMO), or National Comprehensive CancerNetwork (NCCN).

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in NRAS, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in NRAS is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in NRAS is one or more of aRAS-targeted therapeutic, a receptor tyrosine kinase inhibitor, aRas-Raf-MEK-ERK pathway inhibitor, a PI3K-Akt-mTOR pathway inhibitor,and a farnesyl transferase inhibitor. In some embodiments, theRas-Raf-MEK-ERK pathway inhibitor is one or more of a BRAF inhibitor, aMEK inhibitor, and an ERK inhibitor. In some embodiments, the BRAFinhibitor is one or more of vemurafenib (ZELBORAF®), dabrafenib(TAFINLAR®), and encorafenib (BRAFTOVI™), BMS-908662 (XL281), sorafenib,LGX818, PLX3603, RAF265, R05185426, GSK2118436, ARQ 736, GDC-0879,PLX-4720, AZ304, PLX-8394, HM95573, R05126766, and LXH254. In someembodiments, the MEK inhibitor is one or more of trametinib (MEKINIST®,GSK1120212), cobimetinib (COTELLIC®), binimetinib (MEKTOVI®, MEK162),selumetinib (AZD6244), PD0325901, MSC1936369B, SHR7390, TAK-733,R05126766, CS3006, WX-554, PD98059, CI1040 (PD184352), and hypothemycin.In some embodiments, the ERK inhibitor is one or more of FRI-20(ON-01060), VTX-11e, 25-OH-D3-3-BE (B3CD, bromoacetoxycalcidiol),FR-180204, AEZ-131 (AEZS-131), AEZS-136, AZ-13767370, BL-EI-001,LY-3214996, LTT-462, KO-947, KO-947, MK-8353 (SCH900353), SCH772984,ulixertinib (BVD-523), CC-90003, GDC-0994 (RG-7482), ASNO07, FR148083,5-7-Oxozeaenol, 5-iodotubercidin, GDC0994, and ONC201. In someembodiments, the PI3K-Akt-mTOR pathway inhibitor is one or more of aPI3K inhibitor, an AKT inhibitor, and an mTOR inhibitor. In someembodiments, the PI3K inhibitor is one or more of buparlisib (BKM120),alpelisib (BYL719), WX-037, copanlisib (ALIQOPA™, BAY80-6946),dactolisib (NVP-BEZ235, BEZ-235), taselisib (GDC-0032, RG7604),sonolisib (PX-866), CUDC-907, PQR309, ZSTK474, SF1126, AZD8835,GDC-0077, ASNO03, pictilisib (GDC-0941), pilaralisib (XL147, SAR245408),gedatolisib (PF-05212384, PKI-587), serabelisib (TAK-117, MLN1117, INK1117), BGT-226 (NVP-BGT226), PF-04691502, apitolisib (GDC-0980),omipalisib (GSK2126458, GSK458), voxtalisib (XL756, SAR245409), AMG 511,CH5132799, GSK1059615, GDC-0084 (RG7666), VS-5584 (SB2343), PKI-402,wortmannin, LY294002, PI-103, rigosertib, XL-765, LY2023414, SAR260301,KIN-193 (AZD-6428), GS-9820, AMG319, and GSK2636771. In someembodiments, the AKT inhibitor is one or more of miltefosine(IMPADIVO®), wortmannin, NL-71-101, H-89, GSK690693, CCT128930, AZD5363,ipatasertib (GDC-0068, RG7440), A-674563, A-443654, AT7867, AT13148,uprosertib, afuresertib, DC120,2-[4-(2-aminoprop-2-yl)phenyl]-3-phenylquinoxaline, MK-2206, edelfosine,miltefosine, perifosine, erucylphophocholine, erufosine, SR13668,OSU-A9, PH-316, PHT-427, PIT-1, DM-PIT-1, triciribine (TriciribinePhosphate Monohydrate), API-1,N-(4-(5-(3-acetamidophenyl)-2-(2-aminopyridin-3-yl)-3H-imidazo[4,5-b]pyridin-3-yl)benzyl)-3-fluorobenzamide, ARQ092, BAY 1125976,3-oxo-tirucallic acid, lactoquinomycin, boc-Phe-vinyl ketone, Perifosine(D-21266), TCN, TCN-P, GSK2141795, and ONC201. In some embodiments, themTOR inhibitor is one or more of MLN0128, AZD-2014, CC-223, AZD2014,CC-115, everolimus (RAD001), temsirolimus (CCI-779), ridaforolimus(AP-23573), and sirolimus (rapamycin). In some embodiments, the farnesyltransferase inhibitor is one or more of lonafarnib, tipifarnib,BMS-214662, L778123, L744832 and FTI-277. In some embodiments, thetherapeutic intervention administered to the subject having a geneticbiomarker in NRAS is a MEK inhibitor and a PI3K inhibitor. In someembodiments, the therapeutic intervention administered to the subjecthaving a genetic biomarker in NRAS is a MEK inhibitor and an ERKinhibitor. Other therapeutic interventions effective for treating asubject having a genetic biomarker in NRAS are known in the art. In someembodiments, a therapeutic intervention administered to the subjecthaving a genetic biomarker in NRAS is effective in treating a cancer inthe subject. For example, after administration of a therapeuticintervention that is effective in treating a subject having a geneticbiomarker in NRAS, the number of cancer cells in the subject can bereduced, the size of one or more tumors in the subject can be reduced,the rate or extent of metastasis can be reduced, symptoms associatedwith the disease or disorder or condition can be wholly or partlyalleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in CTNNB1, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in CTNNB1 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in CTNNB1 is one or more of aβ-catenin inhibitor, a WNT/β-catenin signaling inhibitor, and a spindleassembly checkpoint kinase TTK (MPS1) inhibitor. In some embodiments,the β-catenin inhibitor is one or more of PRI-724, CWP232291, PNU74654,2,4 diamino-quinazoline, PKF115-584, PKF118-744, PKF118-310, PFK222-815,CGP 049090, ZTM000990, BC21, vitamin D, retinoid acid, aspirin, sulindac(CLINORIL®, Aflodac), 2,4 diamino-quinazoline derivatives, methyl3-{[(4-methylphenyl)sulfonyl]amino}benzoate (MSAB), AV65, iCRT3, iCRT5,and iCRT14. In some embodiments, the WNT/β-catenin signaling inhibitoris one or more of PRI-724, CWP232291, PNU74654, 2,4 diamino-quinazoline,PKF115-584, PKF118-744, PKF118-310, PFK222-815, CGP 049090, ZTM000990,BC21, vitamin D, retinoid acid, aspirin, sulindac (CLINORIL®, Aflodac),2,4 diamino-quinazoline derivatives, methyl3-{[(4-methylphenyl)sulfonyl]amino}benzoate (MSAB), AV65, iCRT3, iCRT5,iCRT14, SM04554, LGK 974, XAV939, curcumin (e.g., Meriva®), quercetin,epigallocatechin gallate (EGCC), resveratrol, DIF, genistein, celecoxib(CELEBREX®), CWP232291, NSC668036, FJ9, BML-286 (3289-8625), IWP, IWP-1,IWP-2, JW55, G007-LK, pyrvinium, foxy-5, Wnt-5a, ipafricept (OMP-54F28),vantictumab (OMP-18R5), OTSA_101, OTSA101-DTPA-90Y, SM04690, SM04755,nutlin-3a, XAV939, IWR1, JW74, okadaic acid, tautomycin, SB239063,SB203580, adenosine diphosphate (hydroxymethyl)pyrrolidinediol(ADP-HPD),2-[4-(4-fluorophenyl)piperazin-1-yl]-6-methylpyrimidin-4(3H)-one, PJ34,niclosamide (NICLOCIDE™), cambinol, sulindac (CLINORIL®, Aflodac),J01-017a, NSC668036, filipin, IC261, PF670462, bosutinib (BOSULIF®),PHA665752, imatinib (GLEEVEC®), ICG-001, ethacrynic acid, ethacryinicacid derivatives, pictilisib (GDC-0941), Rp-8-Br-cAMP, SDX-308, WNT974,CGX1321, ETC-1922159, AD-REIC/Dkk3, WIKI4, and windorphen. In someembodiments, the spindle assembly checkpoint kinase TTK (MPS1) inhibitoris one or more of NTRC 0066-0, CFI-402257, a(5,6-dihydro)pyrimido[4,5-e]indolizine, and BOS172722. Other therapeuticinterventions effective for treating a subject having a geneticbiomarker in CTNNB1 are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in CTNNB1 is effective in treating a cancer in the subject.For example, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in CTNNB1,the number of cancer cells in the subject can be reduced, the size ofone or more tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment. In some embodiments, when a subject is identified as having agenetic biomarker (e.g., a mutation) in PIK3CA, the subject isadministered a therapeutic intervention. In some embodiments, a subjectidentified as having a genetic biomarker (e.g., a mutation) in PIK3CA isidentified as having a cancer (e.g., based on the presence of thegenetic biomarker, either alone or in combination with the presence ofother genetic biomarkers and/or the presence of one or more members ofother classes of biomarkers and/or the presence of aneuploidy asdescribed herein). In some embodiments, the therapeutic interventionadministered to the subject having a genetic biomarker in PIK3CA is oneor more of a PI3K-alpha inhibitor, a panPI3K inhibitor, and a dual PI3Kand mTOR inhibitor. In some embodiments, the PI3K alpha inhibitor istaselisib (GDC-0032, RG7604), GDC-0077, serabelisib (TAK-117, MLN1117,INK 1117), alpelisib (BYL719), and CH5132799. In some embodiments, thepanPI3K inhibitor is buparlisib (BKM120), copanlisib (ALIQOPA™,BAY80-6946), sonolisib (PX-866), ZSTK474, pictilisib (GDC-0941),pilaralisib (XL147, SAR245408), AMG 511, PKI-402, wortmannin, LY294002,and WX-037. In some embodiments, the PI3K and mTOR dual inhibitor isdactolisib (NVP-BEZ235, BEZ-235), PQR309, SF1126, gedatolisib(PF-05212384, PKI-587), BGT-226 (NVP-BGT226), PF-04691502, apitolisib(GDC-0980), omipalisib (GSK2126458, GSK458), voxtalisib (XL756,SAR245409), GSK1059615, GDC-0084 (RG7666), VS-5584 (SB2343), and PI-103.In some embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in PIK3CA is one or more ofbuparlisib (BKM120), alpelisib (BYL719), WX-037, copanlisib (ALIQOPA™,BAY80-6946), dactolisib (NVP-BEZ235, BEZ-235), taselisib (GDC-0032,RG7604), sonolisib (PX-866), CUDC-907, PQR309, ZSTK474, SF1126, AZD8835,GDC-0077, ASNO03, pictilisib (GDC-0941), pilaralisib (XL147, SAR245408),gedatolisib (PF-05212384, PKI-587), serabelisib (TAK-117, MLN1117, INK1117), BGT-226 (NVP-BGT226), PF-04691502, apitolisib (GDC-0980),omipalisib (GSK2126458, GSK458), voxtalisib (XL756, SAR245409), AMG 511,CH5132799, GSK1059615, GDC-0084 (RG7666), VS-5584 (SB2343), PKI-402,wortmannin, LY294002, PI-103, rigosertib, XL-765, LY2023414, SAR260301,KIN-193 (AZD-6428), GS-9820, AMG319, and GSK2636771. Other therapeuticinterventions effective for treating a subject having a geneticbiomarker in PIK3CA are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in PIK3CA is effective in treating a cancer in the subject.For example, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in PIK3CA,the number of cancer cells in the subject can be reduced, the size ofone or more tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in FBXW7, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in FBXW7 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in FBXW7 is one or more of an mTORinhibitor and a MCL-1 inhibitor. In some embodiments, the mTOR inhibitoris one or more of MLN0128, AZD-2014, CC-223, AZD2014, CC-115, everolimus(RAD001), temsirolimus (CCI-779), ridaforolimus (AP-23573), andsirolimus (rapamycin). In some embodiments, the MCL-1 inhibitor isS63845, AZD5991, AMG 176, 483-LM, and MIK665. Other therapeuticinterventions effective for treating a subject having a geneticbiomarker in FBXW7 are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in FBXW7 is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in FBXW7, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in APC, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in APC is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in APC is one or more of TASIN-1(Truncated APC Selective INhibitor) and a WNT/β-catenin signalinginhibitor. In some embodiments, the WNT/β-catenin signaling inhibitor isone or more of PRI-724, CWP232291, PNU74654, 2,4 diamino-quinazoline,PKF115-584, PKF118-744, PKF118-310, PFK222-815, CGP 049090, ZTM000990,BC21, vitamin D, retinoid acid, aspirin, sulindac (CLINORIL®, Aflodac),2,4 diamino-quinazoline derivatives, methyl3-{[(4-methylphenyl)sulfonyl]amino}benzoate (MSAB), AV65, iCRT3, iCRT5,iCRT14, PRI-724, CWP232291, PNU74654, 2,4 diamino-quinazoline,PKF115-584, PKF118-744, PKF118-310, PFK222-815, CGP 049090, ZTM000990,BC21, vitamin D, retinoid acid, aspirin, sulindac (CLINORIL®, Aflodac),2,4 diamino-quinazoline derivatives, methyl3-{[(4-methylphenyl)sulfonyl]amino}benzoate (MSAB), AV65, iCRT3, iCRT5,iCRT14, SM04554, LGK 974, XAV939, curcumin (e.g., Meriva®), quercetin,epigallocatechin gallate (EGCC), resveratrol, DIF, genistein, celecoxib(CELEBREX®), CWP232291, NSC668036, FJ9, BML-286 (3289-8625), IWP, IWP-1,IWP-2, JW55, G007-LK, pyrvinium, foxy-5, Wnt-5a, ipafricept (OMP-54F28),vantictumab (OMP-18R5), OTSA 101, OTSA101-DTPA-90Y, SM04690, SM04755,nutlin-3a, XAV939, IWR1, JW74, okadaic acid, tautomycin, SB239063,SB203580, adenosine diphosphate (hydroxymethyl)pyrrolidinediol(ADP-HPD),2-[4-(4-fluorophenyl)piperazin-1-yl]-6-methylpyrimidin-4(3H)-one, PJ34,niclosamide (NICLOCIDE™), cambinol, sulindac (CLINORIL®, Aflodac),J01-017a, NSC668036, filipin, IC261, PF670462, bosutinib (BOSULIF®),PHA665752, imatinib (GLEEVEC®), ICG-001, ethacrynic acid, ethacryinicacid derivatives, pictilisib (GDC-0941), Rp-8-Br-cAMP, SDX-308, WNT974,CGX1321, ETC-1922159, AD-REIC/Dkk3, WIKI4, and windorphen. Othertherapeutic interventions effective for treating a subject having agenetic biomarker in APC are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in APC is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in APC, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in EGFR, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in EGFR is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in EGFR is one or more of anEGFR-selective inhibitor, a panHER inhibitor, and an anti-EGFR antibody.In some embodiments, the EGFR inhibitor is a covalent inhibitor. In someembodiments, the EGFR inhibitor is a non-covalent inhibitor. In someembodiments, the therapeutic intervention administered to the subjecthaving a genetic biomarker in EGFR is one or more of osimertinib(AZD9291, merelectinib, TAGRISSO™), erlotinib (TARCEVA®), gefitinib(IRESSA®), cetuximab (ERBITUX®), necitumumab (PORTRAZZA™, IMC-11F8),neratinib (HKI-272, NERLYNX®), lapatinib (TYKERB®), panitumumab(ABX-EGF, VECTIBIX®), vandetanib (CAPRELSA®), rociletinib (CO-1686),olmutinib (OLITA™, HM61713, BI-1482694), naquotinib (ASP8273),nazartinib (EGF816, NVS-816), PF-06747775, icotinib (BPI-2009H),afatinib (BIBW 2992, GILOTRIF®), dacomitinib (PF-00299804, PF-804,PF-299, PF-299804), avitinib (AC0010), AC0010MA EAI045, matuzumab(EMD-7200), nimotuzumab (h-R3, BIOMAb EGFR®), zalutumab, MDX447,depatuxizumab (humanized mAb 806, ABT-806), depatuxizumab mafodotin(ABT-414), ABT-806, mAb 806, canertinib (CI-1033), shikonin, shikoninderivatives (e.g., deoxyshikonin, isobutyrylshikonin, acetylshikonin,β,β-dimethylacrylshikonin and acetylalkannin), poziotinib (NOV120101,HM781-36B), AV-412, ibrutinib, WZ4002, brigatinib (AP26113, ALUNBRIG®),pelitinib (EKB-569), tarloxotinib (TH-4000, PR610), BPI-15086, Hemay022,ZN-e4, tesevatinib (KDO19, XL647), YH25448, epitinib (HMPL-813), CK-101,MM-151, AZD3759, ZD6474, PF-06459988, varlintinib (ASLAN001,ARRY-334543), AP32788, HLX07, D-0316, AEE788, HS-10296, avitinib,GW572016, pyrotinib (SHR1258), SCT200, CPGJ602, Sym004, MAb-425,Modotuximab (TAB-H49), futuximab (992 DS), zalutumumab, KL-140,R05083945, IMGN289, JNJ-61186372, LY3164530, Sym013, AMG 595,EGFRBi-Armed Autologous T Cells, and EGFR CAR-T Therapy. Othertherapeutic interventions effective for treating a subject having agenetic biomarker in EGFR are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in EGFR is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in EGFR, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in BRAF, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in BRAF is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in BRAF is one or more of vemurafenib(ZELBORAF®), dabrafenib (TAFINLAR®), and encorafenib (BRAFTOVI™),BMS-908662 (XL281), sorafenib, LGX818, PLX3603, RAF265, R05185426,GSK2118436, ARQ 736, GDC-0879, PLX-4720, AZ304, PLX-8394, HM95573,R05126766, and LXH254. In some embodiments, the therapeutic interventionadministered to the subject having a genetic biomarker in BRAF is a BRAFinhibitor and a MEK inhibitor. In some embodiments, the BRAF inhibitoris vemurafenib (ZELBORAF®), dabrafenib (TAFINLAR®), and encorafenib(BRAFTOVI™), BMS-908662 (XL281), sorafenib, LGX818, PLX3603, RAF265,R05185426, GSK2118436, ARQ 736, GDC-0879, PLX-4720, AZ304, PLX-8394,HM95573, R05126766, or LXH254 and the MEK inhibitor is trametinib(MEKINIST®, GSK1120212), cobimetinib (COTELLIC®), binimetinib (MEKTOVI®,MEK162), selumetinib (AZD6244), PD0325901, MSC1936369B, SHR7390,TAK-733, R05126766, CS3006, WX-554, PD98059, CI1040 (PD184352), orhypothemycin. In some embodiments, the therapeutic interventionadministered to the subject having a genetic biomarker in BRAF is a BRAFinhibitor and an ERK inhibitor. In some embodiments, the BRAF inhibitoris vemurafenib (ZELBORAF®), dabrafenib (TAFINLAR®), and encorafenib(BRAFTOVI™), BMS-908662 (XL281), sorafenib, LGX818, PLX3603, RAF265,R05185426, GSK2118436, ARQ 736, GDC-0879, PLX-4720, AZ304, PLX-8394,HM95573, R05126766, or LXH254 and the ERK inhibitor is FRI-20(ON-01060), VTX-11e, 25-OH-D3-3-BE (B3CD, bromoacetoxycalcidiol),FR-180204, AEZ-131 (AEZS-131), AEZS-136, AZ-13767370, BL-EI-001,LY-3214996, LTT-462, KO-947, KO-947, MK-8353 (SCH900353), SCH772984,ulixertinib (BVD-523), CC-90003, GDC-0994 (RG-7482), ASNO07, FR148083,5-7-Oxozeaenol, 5-iodotubercidin, GDC0994, or ONC201. Other therapeuticinterventions effective for treating a subject having a geneticbiomarker in BRAF are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in BRAF is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in BRAF, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in CDNK2A, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in CDNK2A is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in CDNK2A is a CDK4/6 inhibitor. Insome embodiments, the CDK4/6 inhibitor is one or more of palbociclib,ribociclib, and abemaciclib. Other therapeutic interventions effectivefor treating a subject having a genetic biomarker in CDNK2A are known inthe art. In some embodiments, a therapeutic intervention administered tothe subject having a genetic biomarker in CDNK2A is effective intreating a cancer in the subject. For example, after administration of atherapeutic intervention that is effective in treating a subject havinga genetic biomarker in CDNK2A, the number of cancer cells in the subjectcan be reduced, the size of one or more tumors in the subject can bereduced, the rate or extent of metastasis can be reduced, symptomsassociated with the disease or disorder or condition can be wholly orpartly alleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in CDKN2, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in CDKN2 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in CDKN2 is a CDK4/6 inhibitor. Insome embodiments, the CDK4/6 inhibitor is one or more of palbociclib,ribociclib, and abemaciclib Other therapeutic interventions effectivefor treating a subject having a genetic biomarker in CDKN2 are known inthe art. In some embodiments, a therapeutic intervention administered tothe subject having a genetic biomarker in CDKN2 is effective in treatinga cancer in the subject. For example, after administration of atherapeutic intervention that is effective in treating a subject havinga genetic biomarker in CDKN2, the number of cancer cells in the subjectcan be reduced, the size of one or more tumors in the subject can bereduced, the rate or extent of metastasis can be reduced, symptomsassociated with the disease or disorder or condition can be wholly orpartly alleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in PTEN, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in PTEN is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in PTEN is one or more of aPI3K/AKT/mTOR signaling pathway inhibitor and a PARP inhibitor. In someembodiments, the PI3K-Akt-mTOR pathway inhibitor is one or more of aPI3K inhibitor, an AKT inhibitor, and an mTOR inhibitor. In someembodiments, the PI3K inhibitor is one or more of buparlisib (BKM120),alpelisib (BYL719), WX-037, copanlisib (ALIQOPA™, BAY80-6946),dactolisib (NVP-BEZ235, BEZ-235), taselisib (GDC-0032, RG7604),sonolisib (PX-866), CUDC-907, PQR309, ZSTK474, SF1126, AZD8835,GDC-0077, ASNO03, pictilisib (GDC-0941), pilaralisib (XL147, SAR245408),gedatolisib (PF-05212384, PKI-587), serabelisib (TAK-117, MLN1117, INK1117), BGT-226 (NVP-BGT226), PF-04691502, apitolisib (GDC-0980),omipalisib (GSK2126458, GSK458), voxtalisib (XL756, SAR245409), AMG 511,CH5132799, GSK1059615, GDC-0084 (RG7666), VS-5584 (SB2343), PKI-402,wortmannin, LY294002, PI-103, rigosertib, XL-765, LY2023414, SAR260301,KIN-193 (AZD-6428), GS-9820, AMG319, and GSK2636771. In someembodiments, the AKT inhibitor is one or more of miltefosine(IMPADIVO®), wortmannin, NL-71-101, H-89, GSK690693, CCT128930, AZD5363,ipatasertib (GDC-0068, RG7440), A-674563, A-443654, AT7867, AT13148,uprosertib, afuresertib, DC120,2-[4-(2-aminoprop-2-yl)phenyl]-3-phenylquinoxaline, MK-2206, edelfosine,miltefosine, perifosine, erucylphophocholine, erufosine, SR13668,OSU-A9, PH-316, PHT-427, PIT-1, DM-PIT-1, triciribine (TriciribinePhosphate Monohydrate), API-1,N-(4-(5-(3-acetamidophenyl)-2-(2-aminopyridin-3-yl)-3H-imidazo[4,5-b]pyridin-3-yl)benzyl)-3-fluorobenzamide, ARQ092, BAY 1125976,3-oxo-tirucallic acid, lactoquinomycin, boc-Phe-vinyl ketone, Perifosine(D-21266), TCN, TCN-P, GSK2141795, and ONC201. In some embodiments, themTOR inhibitor is one or more of MLN0128, AZD-2014, CC-223, AZD2014,CC-115, everolimus (RAD001), temsirolimus (CCI-779), ridaforolimus(AP-23573), and sirolimus (rapamycin). In some embodiments, the farnesyltransferase inhibitor is one or more of lonafarnib, tipifarnib,BMS-214662, L778123, L744832 and FTI-277. In some embodiments, the PARPinhibitor is one or more of olaparib, veliparib, iniparib, rucaparib,CEP-9722, E7016, or E7449. Other therapeutic interventions effective fortreating a subject having a genetic biomarker in PTEN are known in theart. In some embodiments, a therapeutic intervention administered to thesubject having a genetic biomarker in PTEN is effective in treating acancer in the subject. For example, after administration of atherapeutic intervention that is effective in treating a subject havinga genetic biomarker in PTEN, the number of cancer cells in the subjectcan be reduced, the size of one or more tumors in the subject can bereduced, the rate or extent of metastasis can be reduced, symptomsassociated with the disease or disorder or condition can be wholly orpartly alleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in FGFR2, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in FGFR2 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in FGFR2 is one or more of ananti-FGFR2 antibody, an FGFR2 selective inhibitor and a pan-FGFRinhibitor. In some embodiments, the therapeutic intervention is acovalent FGFR2 inhibitor (e.g., PRN1371, BLU9931, FIIN-4, H3B-6527, andFIIN-2). In some embodiments, the therapeutic interventions is anon-covalent FGFR2 inhibitor (e.g., AZD4547, BGJ398, Debio-1347,dovitinib, JNJ-42756493 and LY2874455). In some embodiments, theanti-FGFR2 antibody is GP369, BAY1187982, or FPA144 (bemarituzumab). Insome embodiments, the therapeutic intervention is one or more ofPRN1371, BLU9931, FIIN-4, H3B-6527, NVP-BGJ398, ARQ087, TAS-120,JNJ-42756493, CH5183284/Debio 1347, INCB054828, GP369, BAY1187982, orFPA144 (bemarituzumab), NVP-BGJ398, JNJ-42756493 (erdafitinib),rogaratinib (BAY1163877), FIIN-2, JNJ-42756493, LY2874455, lenvatinib(E7080), ponatinib (AP24534), regorafenib (BAY 73-4506), dovitinib(TKI258), lucitanib (E3810), cediranib (AZD2171), intedanib (BIBF 1120),brivanib (BMS-540215), ASP5878, AZD4547, BGJ398 (infigratinib), E7090,HMPL-453, nintedanib (OFEV®, BIBF 1120), MAX-40279, XL999, orantinib(SU6668), pazopanib (VOTRIENT®), anlotinib, AL3818. Other therapeuticinterventions effective for treating a subject having a geneticbiomarker in FGFR2 are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in FGFR2 is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in FGFR2, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in HRAS, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in HRAS is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in HRAS is one or more of aRAS-targeted therapeutic, a receptor tyrosine kinase inhibitor, aRas-Raf-MEK-ERK pathway inhibitor, a PI3K-Akt-mTOR pathway inhibitor,and a farnesyl transferase inhibitor. In some embodiments, theRas-Raf-MEK-ERK pathway inhibitor is one or more of a BRAF inhibitor, aMEK inhibitor, and an ERK inhibitor. In some embodiments, the BRAFinhibitor is one or more of vemurafenib (ZELBORAF®), dabrafenib(TAFINLAR®), and encorafenib (BRAFTOVI™), BMS-908662 (XL281), sorafenib,LGX818, PLX3603, RAF265, R05185426, GSK2118436, ARQ 736, GDC-0879,PLX-4720, AZ304, PLX-8394, HM95573, R05126766, and LXH254. In someembodiments, the MEK inhibitor is one or more of trametinib (MEKINIST®,GSK1120212), cobimetinib (COTELLIC®), binimetinib (MEKTOVI®, MEK162),selumetinib (AZD6244), PD0325901, MSC1936369B, SHR7390, TAK-733,R05126766, CS3006, WX-554, PD98059, CI1040 (PD184352), and hypothemycin.In some embodiments, the ERK inhibitor is one or more of FRI-20(ON-01060), VTX-11e, 25-0H-D3-3-BE (B3CD, bromoacetoxycalcidiol),FR-180204, AEZ-131 (AEZS-131), AEZS-136, AZ-13767370, BL-EI-001,LY-3214996, LTT-462, KO-947, KO-947, MK-8353 (SCH900353), SCH772984,ulixertinib (BVD-523), CC-90003, GDC-0994 (RG-7482), ASNO07, FR148083,5-7-Oxozeaenol, 5-iodotubercidin, GDC0994, and ONC201. In someembodiments, the PI3K-Akt-mTOR pathway inhibitor is one or more of aPI3K inhibitor, an AKT inhibitor, and a mTOR inhibitor. In someembodiments, the PI3K inhibitor is one or more of buparlisib (BKM120),alpelisib (BYL719), WX-037, copanlisib (ALIQOPA™, BAY80-6946),dactolisib (NVP-BEZ235, BEZ-235), taselisib (GDC-0032, RG7604),sonolisib (PX-866), CUDC-907, PQR309, ZSTK474, SF1126, AZD8835,GDC-0077, ASNO03, pictilisib (GDC-0941), pilaralisib (XL147, SAR245408),gedatolisib (PF-05212384, PKI-587), serabelisib (TAK-117, MLN1117, INK1117), BGT-226 (NVP-BGT226), PF-04691502, apitolisib (GDC-0980),omipalisib (GSK2126458, GSK458), voxtalisib (XL756, SAR245409), AMG 511,CH5132799, GSK1059615, GDC-0084 (RG7666), VS-5584 (SB2343), PKI-402,wortmannin, LY294002, PI-103, rigosertib, XL-765, LY2023414, SAR260301,KIN-193 (AZD-6428), GS-9820, AMG319, and GSK2636771. In someembodiments, the AKT inhibitor is one or more of miltefosine(IMPADIVO®), wortmannin, NL-71-101, H-89, GSK690693, CCT128930, AZD5363,ipatasertib (GDC-0068, RG7440), A-674563, A-443654, AT7867, AT13148,uprosertib, afuresertib, DC120,2-[4-(2-aminoprop-2-yl)phenyl]-3-phenylquinoxaline, MK-2206, edelfosine,miltefosine, perifosine, erucylphophocholine, erufosine, SR13668, 0SU-A9, PH-316, PHT-427, PIT-1, DM-PIT-1, triciribine (TriciribinePhosphate Monohydrate), API-1,N-(4-(5-(3-acetamidophenyl)-2-(2-aminopyridin-3-yl)-3H-imidazo[4,5-b]pyridin-3-yl)benzyl)-3-fluorobenzamide, ARQ092, BAY 1125976,3-oxo-tirucallic acid, lactoquinomycin, boc-Phe-vinyl ketone, Perifosine(D-21266), TCN, TCN-P, GSK2141795, and ONC201. In some embodiments, themTOR inhibitor is one or more of MLN0128, AZD-2014, CC-223, AZD2014,CC-115, everolimus (RAD001), temsirolimus (CCI-779), ridaforolimus(AP-23573), and sirolimus (rapamycin). In some embodiments, the farnesyltransferase inhibitor is one or more of lonafarnib, tipifarnib,BMS-214662, L778123, L744832 and FTI-277. In some embodiments, thetherapeutic intervention administered to the subject having a geneticbiomarker in HRAS is a MEK inhibitor and a PI3K inhibitor. In someembodiments, the therapeutic intervention administered to the subjecthaving a genetic biomarker in HRAS is a MEK inhibitor and an ERKinhibitor. Other therapeutic interventions effective for treating asubject having a genetic biomarker in HRAS are known in the art. In someembodiments, a therapeutic intervention administered to the subjecthaving a genetic biomarker in HRAS is effective in treating a cancer inthe subject. For example, after administration of a therapeuticintervention that is effective in treating a subject having a geneticbiomarker in HRAS, the number of cancer cells in the subject can bereduced, the size of one or more tumors in the subject can be reduced,the rate or extent of metastasis can be reduced, symptoms associatedwith the disease or disorder or condition can be wholly or partlyalleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in KRAS, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in KRAS is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in KRAS is one or more of aRAS-targeted therapeutic, a receptor tyrosine kinase inhibitor, aRas-Raf-MEK-ERK pathway inhibitor, a PI3K-Akt-mTOR pathway inhibitor,and a farnesyl transferase inhibitor. In some embodiments, theRAS-targeted therapeutic is one or more of SML-10-70-4 and AA12. In someembodiments, the Ras-Raf-MEK-ERK pathway inhibitor is one or more of aBRAF inhibitor, a MEK inhibitor, and an ERK inhibitor. In someembodiments, the BRAF inhibitor is one or more of vemurafenib(ZELBORAF®), dabrafenib (TAFINLAR®), and encorafenib (BRAFTOVI™),BMS-908662 (XL281), sorafenib, LGX818, PLX3603, RAF265, R05185426,GSK2118436, ARQ 736, GDC-0879, PLX-4720, AZ304, PLX-8394, HM95573,R05126766, and LXH254. In some embodiments, the MEK inhibitor is one ormore of trametinib (MEKINIST®, GSK1120212), cobimetinib (COTELLIC®),binimetinib (MEKTOVI®, MEK162), selumetinib (AZD6244), PD0325901,MSC1936369B, SHR7390, TAK-733, R05126766, CS3006, WX-554, PD98059,CI1040 (PD184352), and hypothemycin. In some embodiments, the ERKinhibitor is one or more of FRI-20 (ON-01060), VTX-11e, 25-0H-D3-3-BE(B3CD, bromoacetoxycalcidiol), FR-180204, AEZ-131 (AEZS-131), AEZS-136,AZ-13767370, BL-EI-001, LY-3214996, LTT-462, KO-947, KO-947, MK-8353(SCH900353), SCH772984, ulixertinib (BVD-523), CC-90003, GDC-0994(RG-7482), ASNO07, FR148083, 5-7-Oxozeaenol, 5-iodotubercidin, GDC0994,and ONC201. In some embodiments, the PI3K-Akt-mTOR pathway inhibitor isone or more of a PI3K inhibitor, an AKT inhibitor, and a mTOR inhibitor.In some embodiments, the PI3K inhibitor is one or more of buparlisib(BKM120), alpelisib (BYL719), WX-037, copanlisib (ALIQOPA™, BAY80-6946),dactolisib (NVP-BEZ235, BEZ-235), taselisib (GDC-0032, RG7604),sonolisib (PX-866), CUDC-907, PQR309, ZSTK474, SF1126, AZD8835,GDC-0077, ASNO03, pictilisib (GDC-0941), pilaralisib (XL147, SAR245408),gedatolisib (PF-05212384, PKI-587), serabelisib (TAK-117, MLN1117, INK1117), BGT-226 (NVP-BGT226), PF-04691502, apitolisib (GDC-0980),omipalisib (GSK2126458, GSK458), voxtalisib (XL756, SAR245409), AMG 511,CH5132799, GSK1059615, GDC-0084 (RG7666), VS-5584 (SB2343), PKI-402,wortmannin, LY294002, PI-103, rigosertib, XL-765, LY2023414, SAR260301,KIN-193 (AZD-6428), GS-9820, AMG319, and GSK2636771. In someembodiments, the AKT inhibitor is one or more of miltefosine(IMPADIVO®), wortmannin, NL-71-101, H-89, GSK690693, CCT128930, AZD5363,ipatasertib (GDC-0068, RG7440), A-674563, A-443654, AT7867, AT13148,uprosertib, afuresertib, DC120,2-[4-(2-aminoprop-2-yl)phenyl]-3-phenylquinoxaline, MK-2206, edelfosine,miltefosine, perifosine, erucylphophocholine, erufosine, SR13668, 0SU-A9, PH-316, PHT-427, PIT-1, DM-PIT-1, triciribine (TriciribinePhosphate Monohydrate), API-1,N-(4-(5-(3-acetamidophenyl)-2-(2-aminopyridin-3-yl)-3H-imidazo[4,5-b]pyridin-3-yl)benzyl)-3-fluorobenzamide, ARQ092, BAY 1125976,3-oxo-tirucallic acid, lactoquinomycin, boc-Phe-vinyl ketone, Perifosine(D-21266), TCN, TCN-P, GSK2141795, and ONC201. In some embodiments, themTOR inhibitor is one or more of MLN0128, AZD-2014, CC-223, AZD2014,CC-115, everolimus (RAD001), temsirolimus (CCI-779), ridaforolimus(AP-23573), and sirolimus (rapamycin). In some embodiments, the farnesyltransferase inhibitor is one or more of lonafarnib, tipifarnib,BMS-214662, L778123, L744832 and FTI-277. In some embodiments, thetherapeutic intervention administered to the subject having a geneticbiomarker in KRAS is a MEK inhibitor and a PI3K inhibitor. In someembodiments, the therapeutic intervention administered to the subjecthaving a genetic biomarker in KRAS is a MEK inhibitor and an ERKinhibitor. Other therapeutic interventions effective for treating asubject having a genetic biomarker in KRAS are known in the art. In someembodiments, a therapeutic intervention administered to the subjecthaving a genetic biomarker in KRAS is effective in treating a cancer inthe subject. For example, after administration of a therapeuticintervention that is effective in treating a subject having a geneticbiomarker in KRAS, the number of cancer cells in the subject can bereduced, the size of one or more tumors in the subject can be reduced,the rate or extent of metastasis can be reduced, symptoms associatedwith the disease or disorder or condition can be wholly or partlyalleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in AKT1, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in AKT1 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in AKT1 is one or more of miltefosine(IMPADIVO®), wortmannin, NL-71-101, H-89, GSK690693, CCT128930, AZD5363,ipatasertib (GDC-0068, RG7440), A-674563, A-443654, AT7867, AT13148,uprosertib, afuresertib, DC120,2-[4-(2-aminoprop-2-yl)phenyl]-3-phenylquinoxaline, MK-2206, edelfosine,miltefosine, perifosine, erucylphophocholine, erufosine, SR13668,OSU-A9, PH-316, PHT-427, PIT-1, DM-PIT-1, triciribine (TriciribinePhosphate Monohydrate), API-1,N-(4-(5-(3-acetamidophenyl)-2-(2-aminopyridin-3-yl)-3H-imidazo[4,5-b]pyridin-3-yl)benzyl)-3-fluorobenzamide, ARQ092, BAY 1125976,3-oxo-tirucallic acid, lactoquinomycin, boc-Phe-vinyl ketone, Perifosine(D-21266), TCN, TCN-P, GSK2141795, and ONC201. Other therapeuticinterventions effective for treating a subject having a geneticbiomarker in AKT1 are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in AKT1 is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in AKT1, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in TP53, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in TP53 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in TP53 is one or more of p53reactivation and induction of massive apoptosis-1 (PRIMA-1), APR-246(PRIMA-1^(MET)) 2-sulfonylpyrimidines such as PK11007, pyrazoles such asPK7088, zinc metallochaperone-1 (ZMC1; NSC319726/ZMC 1), athiosemicarbazone (e.g., COTI-2), CP-31398, STIMA-1 (SH Group-TargetingCompound That Induces Massive Apoptosis), MIRA-1 (NSC19630) and itsanalogs MIRA-2 and -3, RITA (NSC652287), Chetomin (CTM), PK7088, Sticticacid (NSC87511), p53R3, SCH529074, WR-1065, Hsp90 inhibitors (e.g.,17-AAG, geldanamycin, ganetespib, AUY922, IPI-504), HDAC inhibitors(e.g., vorinostat/SAHA, romidepsin/depsipeptide, HBI-8000), arseniccompounds, gambogic acid, spautin-1, YK-3-237, NSC59984, disulfiram(DSF), gentamicin, G418, and amikamicin, reactivate transcriptionalactivity (RETRA), PD0166285, inhibitors of MDM2 (e.g., RG7112(R05045337), R05503781, MI-773 (SAR405838), DS-3032b, AM-8553, AMG 232,MI-219, MI-713, MI-888, TDP521252, NSC279287, PXN822, SAH-8 (stapledpeptides), ATSP-7041, spiroligomer, PK083, PK5174, PK5196, PK7088,nutlin 3a, RG7388, Ro-2443, stictic acid, and NSC319726), and inhibitorsof MDM4. Other therapeutic interventions effective for treating asubject having a genetic biomarker in TP53 are known in the art. In someembodiments, a therapeutic intervention administered to the subjecthaving a genetic biomarker in TP53 is effective in treating a cancer inthe subject. For example, after administration of a therapeuticintervention that is effective in treating a subject having a geneticbiomarker in TP53, the number of cancer cells in the subject can bereduced, the size of one or more tumors in the subject can be reduced,the rate or extent of metastasis can be reduced, symptoms associatedwith the disease or disorder or condition can be wholly or partlyalleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in PPP2R1A, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in PPP2R1A is identifiedas having a cancer (e.g., based on the presence of the geneticbiomarker, either alone or in combination with the presence of othergenetic biomarkers and/or the presence of one or more members of otherclasses of biomarkers and/or the presence of aneuploidy as describedherein). In some embodiments, the therapeutic intervention administeredto the subject having a genetic biomarker in PPP2R1A is one or more ofactivators of PP2A such as SET inhibitors (e.g., FTY-720, ceramide, andOP449). Other therapeutic interventions effective for treating a subjecthaving a genetic biomarker in PPP2R1A are known in the art. In someembodiments, a therapeutic intervention administered to the subjecthaving a genetic biomarker in PPP2R1A is effective in treating a cancerin the subject. For example, after administration of a therapeuticintervention that is effective in treating a subject having a geneticbiomarker in PPP2R1A, the number of cancer cells in the subject can bereduced, the size of one or more tumors in the subject can be reduced,the rate or extent of metastasis can be reduced, symptoms associatedwith the disease or disorder or condition can be wholly or partlyalleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in GNAS, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in GNAS is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in GNAS is one or more of aRas-Raf-MEK-ERK pathway inhibitor and a WNT/β-catenin signalinginhibitor. In some embodiments, the Ras-Raf-MEK-ERK pathway inhibitor isone or more of a BRAF inhibitor, a MEK inhibitor, and an ERK inhibitor.In some embodiments, the BRAF inhibitor is one or more of vemurafenib(ZELBORAF®), dabrafenib (TAFINLAR®), and encorafenib (BRAFTOVI™),BMS-908662 (XL281), sorafenib, LGX818, PLX3603, RAF265, R05185426,GSK2118436, ARQ 736, GDC-0879, PLX-4720, AZ304, PLX-8394, HM95573,R05126766, and LXH254. In some embodiments, the MEK inhibitor is one ormore of trametinib (MEKINIST®, GSK1120212), cobimetinib (COTELLIC®),binimetinib (MEKTOVI®, MEK162), selumetinib (AZD6244), PD0325901,MSC1936369B, SHR7390, TAK-733, R05126766, CS3006, WX-554, PD98059,CI1040 (PD184352), and hypothemycin. In some embodiments, the ERKinhibitor is one or more of FRI-20 (ON-01060), VTX-11e, 25-OH-D3-3-BE(B3CD, bromoacetoxycalcidiol), FR-180204, AEZ-131 (AEZS-131), AEZS-136,AZ-13767370, BL-EI-001, LY-3214996, LTT-462, KO-947, KO-947, MK-8353(SCH900353), SCH772984, ulixertinib (BVD-523), CC-90003, GDC-0994(RG-7482), ASNO07, FR148083, 5-7-Oxozeaenol, 5-iodotubercidin, GDC0994,and ONC201. In some embodiments, the WNT/β-catenin signaling inhibitoris one or more of PRI-724, CWP232291, PNU74654, 2,4 diamino-quinazoline,PKF115-584, PKF118-744, PKF118-310, PFK222-815, CGP 049090, ZTM000990,BC21, vitamin D, retinoid acid, aspirin, sulindac (CLINORIL®, Aflodac),2,4 diamino-quinazoline derivatives, methyl3-{[(4-methylphenyl)sulfonyl]amino}benzoate (MSAB), AV65, iCRT3, iCRT5,iCRT14, PRI-724, CWP232291, PNU74654, 2,4 diamino-quinazoline,PKF115-584, PKF118-744, PKF118-310, PFK222-815, CGP 049090, ZTM000990,BC21, vitamin D, retinoid acid, aspirin, sulindac (CLINORIL®, Aflodac),2,4 diamino-quinazoline derivatives, methyl3-{[(4-methylphenyl)sulfonyl]amino}benzoate (MSAB), AV65, iCRT3, iCRT5,iCRT14, SM04554, LGK 974, XAV939, curcumin (e.g., Meriva®), quercetin,epigallocatechin gallate (EGCC), resveratrol, DIF, genistein, celecoxib(CELEBREX®), CWP232291, NSC668036, FJ9, BML-286 (3289-8625), IWP, IWP-1,IWP-2, JW55, G007-LK, pyrvinium, foxy-5, Wnt-5a, ipafricept (OMP-54F28),vantictumab (OMP-18R5), OTSA_101, OTSA101-DTPA-90Y, SM04690, SM04755,nutlin-3a, XAV939, IWR1, JW74, okadaic acid, tautomycin, SB239063,SB203580, adenosine diphosphate (hydroxymethyl)pyrrolidinediol(ADP-HPD),2-[4-(4-fluorophenyl)piperazin-1-yl]-6-methylpyrimidin-4(3H)-one, PJ34,niclosamide (NICLOCIDE™), cambinol, sulindac (CLINORIL®, Aflodac),J01-017a, NSC668036, filipin, IC261, PF670462, bosutinib (BOSULIF®),PHA665752, imatinib (GLEEVEC®), ICG-001, ethacrynic acid, ethacryinicacid derivatives, pictilisib (GDC-0941), Rp-8-Br-cAMP, SDX-308, WNT974,CGX1321, ETC-1922159, AD-REIC/Dkk3, WIKI4, and windorphen. Othertherapeutic interventions effective for treating a subject having agenetic biomarker in GNAS are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in GNAS is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in GNAS, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in SMAD4, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in SMAD4 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in SMAD4 is one or more of a PI3Kinhibitor, antiangiogenic therapy, and 5-FU-based chemotherapy. In someembodiments, the PI3K inhibitor is one or more of buparlisib (BKM120),alpelisib (BYL719), WX-037, copanlisib (ALIQOPA™, BAY80-6946),dactolisib (NVP-BEZ235, BEZ-235), taselisib (GDC-0032, RG7604),sonolisib (PX-866), CUDC-907, PQR309, ZSTK474, SF1126, AZD8835,GDC-0077, ASNO03, pictilisib (GDC-0941), pilaralisib (XL147, SAR245408),gedatolisib (PF-05212384, PKI-587), serabelisib (TAK-117, MLN1117, INK1117), BGT-226 (NVP-BGT226), PF-04691502, apitolisib (GDC-0980),omipalisib (GSK2126458, GSK458), voxtalisib (XL756, SAR245409), AMG 511,CH5132799, GSK1059615, GDC-0084 (RG7666), VS-5584 (SB2343), PKI-402,wortmannin, LY294002, PI-103, rigosertib, XL-765, LY2023414, SAR260301,KIN-193 (AZD-6428), GS-9820, AMG319, and GSK2636771. In someembodiments, the antiangiogenic therapy is an inhibitor of one or moreof VEGFR1, VEGFR, VEGFR2, VEGFA, CDH5, EDNRA, ANGPT2, CD34, and ANGPT.In some embodiments, the antiangiogenic therapy is one or more ofvatalanib (PTK787/ZK222584), TKI-538, sunitinib (SU11248, SUTENT®),pazopanib (VOTRIENT®), bevacizumab (AVASTIN®), thalidomide, lenalidomide(REVLIMID®), ranibizumab, EYE001, and axitinib (AG013736, INLYTA®).Other therapeutic interventions effective for treating a subject havinga genetic biomarker in SMAD4 are known in the art. In some embodiments,a therapeutic intervention administered to the subject having a geneticbiomarker in SMAD4 is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in SMAD4, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in POLE, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in POLE is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in POLE is one or more ofimmunotherapy and an immune checkpoint inhibitor. Other therapeuticinterventions effective for treating a subject having a geneticbiomarker in POLE are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in POLE is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in POLE, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in RNF43, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in RNF43 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in RNF43 is one or more of autologousRNF43 peptide-pulsed dendritic cells (DCs), RNF43 peptide-pulsed DCs,systemic low dose interleukin-2, and PORCN inhibitors. In someembodiments, the PORCN inhibitor is one or more of RXC0004, ETC-1922159,ETC-159, IWP-2, LGK974, and WNT-059. Other therapeutic interventionseffective for treating a subject having a genetic biomarker in RNF43 areknown in the art. In some embodiments, a therapeutic interventionadministered to the subject having a genetic biomarker in RNF43 iseffective in treating a cancer in the subject. For example, afteradministration of a therapeutic intervention that is effective intreating a subject having a genetic biomarker in RNF43, the number ofcancer cells in the subject can be reduced, the size of one or moretumors in the subject can be reduced, the rate or extent of metastasiscan be reduced, symptoms associated with the disease or disorder orcondition can be wholly or partly alleviated, the state of the diseasecan be stabilized (i.e., not worsened), and/or survival can be prolongedas compared to expected survival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in MAPK1, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in MAPK1 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in MAPK1 is one or more of an ERKinhibitor, a MEK inhibitor, an ERBB-receptor inhibitor (e.g., an EGFRinhibitor or a HER2 inhibitor), or PI3K-Akt-mTOR pathway inhibitor. Insome embodiments, the ERK inhibitor is one or more of FRI-20 (ON-01060),VTX-11e, 25-OH-D3-3-BE (B3CD, bromoacetoxycalcidiol), FR-180204, AEZ-131(AEZS-131), AEZS-136, AZ-13767370, BL-EI-001, LY-3214996, LTT-462,KO-947, KO-947, MK-8353 (SCH900353), SCH772984, ulixertinib (BVD-523),CC-90003, GDC-0994 (RG-7482), ASNO07, FR148083, 5-7-Oxozeaenol,5-iodotubercidin, GDC0994, ONC201. In some embodiments, the MEKinhibitor is one or more of trametinib (MEKINIST®, GSK1120212),cobimetinib (COTELLIC®), binimetinib (MEKTOVI®, MEK162), selumetinib(AZD6244), PD0325901, MSC1936369B, SHR7390, TAK-733, R05126766, CS3006,WX-554, PD98059, CI1040 (PD184352), and hypothemycin. In someembodiments, the PI3K-Akt-mTOR pathway inhibitor is one or more of aPI3K inhibitor, an AKT inhibitor, and an mTOR inhibitor. In someembodiments, the PI3K inhibitor is one or more of buparlisib (BKM120),alpelisib (BYL719), WX-037, copanlisib (ALIQOPA™, BAY80-6946),dactolisib (NVP-BEZ235, BEZ-235), taselisib (GDC-0032, RG7604),sonolisib (PX-866), CUDC-907, PQR309, ZSTK474, SF1126, AZD8835,GDC-0077, ASNO03, pictilisib (GDC-0941), pilaralisib (XL147, SAR245408),gedatolisib (PF-05212384, PKI-587), serabelisib (TAK-117, MLN1117, INK1117), BGT-226 (NVP-BGT226), PF-04691502, apitolisib (GDC-0980),omipalisib (GSK2126458, GSK458), voxtalisib (XL756, SAR245409), AMG 511,CH5132799, GSK1059615, GDC-0084 (RG7666), VS-5584 (SB2343), PKI-402,wortmannin, LY294002, PI-103, rigosertib, XL-765, LY2023414, SAR260301,KIN-193 (AZD-6428), GS-9820, AMG319, and GSK2636771. In someembodiments, the AKT inhibitor is one or more of miltefosine(IMPADIVO®), wortmannin, NL-71-101, H-89, GSK690693, CCT128930, AZD5363,ipatasertib (GDC-0068, RG7440), A-674563, A-443654, AT7867, AT13148,uprosertib, afuresertib, DC120,2-[4-(2-aminoprop-2-yl)phenyl]-3-phenylquinoxaline, MK-2206, edelfosine,miltefosine, perifosine, erucylphophocholine, erufosine, SR13668, 0SU-A9, PH-316, PHT-427, PIT-1, DM-PIT-1, triciribine (TriciribinePhosphate Monohydrate), API-1,N-(4-(5-(3-acetamidophenyl)-2-(2-aminopyridin-3-yl)-3H-imidazo[4,5-b]pyridin-3-yl)benzyl)-3-fluorobenzamide, ARQ092, BAY 1125976,3-oxo-tirucallic acid, lactoquinomycin, boc-Phe-vinyl ketone, Perifosine(D-21266), TCN, TCN-P, GSK2141795, and ONC201. In some embodiments, themTOR inhibitor is one or more of MLN0128, AZD-2014, CC-223, AZD2014,CC-115, everolimus (RAD001), temsirolimus (CCI-779), ridaforolimus(AP-23573), and sirolimus (rapamycin). In some embodiments, the HER2inhibitor is one or more of AZD8931, AST1306, AEE788, CP724714, CUDC101,TAK285, dacomitinib, pelitinib, AC480, trastuzumab (HERCEPTIN®),pertuzumab (PERJETA®), trastuzumab-dkst (OGIVRI®), DXL-702, E-75,PX-104.1, ZW25, CP-724714, irbinitinib (ARRY-380, ONT-380), TAS0728,lapatinib (TYKERB®, TYVERB®), AST-1306, AEE-788, perlitinib (EKB-569),afatinib (BIBW 2992, GILOTRIF®), neratinib (HKI-272, NERLYNX®, PKI-166,D-69491, HKI-357, AP32788, GW572016, canertinib (CI-1033), AC-480(BMS-599626), dacomitinib (PF299804, PF299), RB-200h, ARRY-334543(ARRY-543, ASLAN001), poziotinib (NOV120101), CUDC-101, emodin, IDM-1,ado-trastuzumab emtansine (KADCYLA®), Zemab, DS-8201a, T-DM1, anti-HER2CAR-T therapy, HER2-Peptid-Vakzine, and HER2Bi-Armed Activated T Cells.In some embodiments, the EGFR inhibitor is osimertinib (AZD9291,merelectinib, TAGRISSO™), erlotinib (TARCEVA®), gefitinib (IRESSA®),cetuximab (ERBITUX®), necitumumab (PORTRAZZA™, IMC-11F8), neratinib(HKI-272, NERLYNX®), lapatinib (TYKERB®), panitumumab (ABX-EGF,VECTIBIX®), vandetanib (CAPRELSA®), rociletinib (CO-1686), olmutinib(OLITA™, HM61713, BI-1482694), naquotinib (ASP8273), nazartinib (EGF816,NVS-816), PF-06747775, icotinib (BPI-2009H), afatinib (BIBW 2992,GILOTRIF®), dacomitinib (PF-00299804, PF-804, PF-299, PF-299804),avitinib (AC0010), AC0010MA EAI045, matuzumab (EMD-7200), nimotuzumab(h-R3, BIOMAb EGFR®), zalutumab, MDX447, depatuxizumab (humanized mAb806, ABT-806), depatuxizumab mafodotin (ABT-414), ABT-806, mAb 806,canertinib (CI-1033), shikonin, shikonin derivatives (e.g.,deoxyshikonin, isobutyrylshikonin, acetylshikonin,β,β-dimethylacrylshikonin and acetylalkannin), poziotinib (NOV120101,HM781-36B), AV-412, ibrutinib, WZ4002, brigatinib (AP26113, ALUNBRIG®),pelitinib (EKB-569), tarloxotinib (TH-4000, PR610), BPI-15086, Hemay022,ZN-e4, tesevatinib (KDO19, XL647), YH25448, epitinib (HMPL-813), CK-101,MM-151, AZD3759, ZD6474, PF-06459988, varlintinib (ASLAN001,ARRY-334543), AP32788, HLX07, D-0316, AEE788, HS-10296, avitinib,GW572016, pyrotinib (SHR1258), SCT200, CPGJ602, Sym004, MAb-425,Modotuximab (TAB-H49), futuximab (992 DS), zalutumumab, KL-140,R05083945, IMGN289, JNJ-61186372, LY3164530, Sym013, AMG 595,EGFRBi-Armed Autologous T Cells, and EGFR CAR-T Therapy. Othertherapeutic interventions effective for treating a subject having agenetic biomarker in MAPK1 are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in MAPK1 is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in MAPK1, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in PI3KR1, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in PI3KR1 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in PI3KR1 is one or more of one ormore of a panPI3K inhibitor, a dual PI3K and mTOR inhibitor, and aRas-Raf-MEK-ERK pathway inhibitor. In some embodiments, the panPI3Kinhibitor is buparlisib (BKM120), copanlisib (ALIQOPA™, BAY80-6946),sonolisib (PX-866), ZSTK474, pictilisib (GDC-0941), pilaralisib (XL147,SAR245408), AMG 511, PKI-402, wortmannin, LY294002, and WX-037. In someembodiments, the PI3K and mTOR dual inhibitor is dactolisib (NVP-BEZ235,BEZ-235), PQR309, SF1126, gedatolisib (PF-05212384, PKI-587), BGT-226(NVP-BGT226), PF-04691502, apitolisib (GDC-0980), omipalisib(GSK2126458, GSK458), voxtalisib (XL756, SAR245409), GSK1059615,GDC-0084 (RG7666), VS-5584 (SB2343), and PI-103. In some embodiments,the therapeutic intervention administered to the subject having agenetic biomarker in PIK3CA is one or more of buparlisib (BKM120),alpelisib (BYL719), WX-037, copanlisib (ALIQOPA™, BAY80-6946),dactolisib (NVP-BEZ235, BEZ-235), taselisib (GDC-0032, RG7604),sonolisib (PX-866), CUDC-907, PQR309, ZSTK474, SF1126, AZD8835,GDC-0077, ASNO03, pictilisib (GDC-0941), pilaralisib (XL147, SAR245408),gedatolisib (PF-05212384, PKI-587), serabelisib (TAK-117, MLN1117, INK1117), BGT-226 (NVP-BGT226), PF-04691502, apitolisib (GDC-0980),omipalisib (GSK2126458, GSK458), voxtalisib (XL756, SAR245409), AMG 511,CH5132799, GSK1059615, GDC-0084 (RG7666), VS-5584 (SB2343), PKI-402,wortmannin, LY294002, PI-103, rigosertib, XL-765, LY2023414, SAR260301,KIN-193 (AZD-6428), GS-9820, AMG319, and GSK2636771. In someembodiments, the Ras-Raf-MEK-ERK pathway inhibitor is one or more of aBRAF inhibitor, a MEK inhibitor, and an ERK inhibitor. In someembodiments, the BRAF inhibitor is one or more of vemurafenib(ZELBORAF®), dabrafenib (TAFINLAR®), and encorafenib (BRAFTOVI™),BMS-908662 (XL281), sorafenib, LGX818, PLX3603, RAF265, R05185426,GSK2118436, ARQ 736, GDC-0879, PLX-4720, AZ304, PLX-8394, HM95573,R05126766, and LXH254. In some embodiments, the MEK inhibitor is one ormore of trametinib (MEKINIST®, GSK1120212), cobimetinib (COTELLIC®),binimetinib (MEKTOVI®, MEK162), selumetinib (AZD6244), PD0325901,MSC1936369B, SHR7390, TAK-733, R05126766, CS3006, WX-554, PD98059,CI1040 (PD184352), and hypothemycin. In some embodiments, the ERKinhibitor is one or more of FRI-20 (ON-01060), VTX-11e, 25-OH-D3-3-BE(B3CD, bromoacetoxycalcidiol), FR-180204, AEZ-131 (AEZS-131), AEZS-136,AZ-13767370, BL-EI-001, LY-3214996, LTT-462, KO-947, KO-947, MK-8353(SCH900353), SCH772984, ulixertinib (BVD-523), CC-90003, GDC-0994(RG-7482), ASNO07, FR148083, 5-7-Oxozeaenol, 5-iodotubercidin, GDC0994,and ONC201. Other therapeutic interventions effective for treating asubject having a genetic biomarker in PI3KR1 are known in the art. Insome embodiments, a therapeutic intervention administered to the subjecthaving a genetic biomarker in PI3KR1 is effective in treating a cancerin the subject. For example, after administration of a therapeuticintervention that is effective in treating a subject having a geneticbiomarker in PI3KR1, the number of cancer cells in the subject can bereduced, the size of one or more tumors in the subject can be reduced,the rate or extent of metastasis can be reduced, symptoms associatedwith the disease or disorder or condition can be wholly or partlyalleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in FGFR3, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in FGFR3 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in FGFR3 is one or more of ananti-FGFR3 antibody, an FGFR3 selective inhibitor and a pan-FGFRinhibitor. In some embodiments, the therapeutic intervention is acovalent FGFR inhibitor (e.g., PRN1371, BLU9931, FIIN-4, H3B-6527, andFIIN-2). In some embodiments, the therapeutic interventions is anon-covalent FGFR inhibitor (e.g., AZD4547, BGJ398, Debio-1347,dovitinib, JNJ-42756493 and LY2874455). In some embodiments, theanti-FGFR3 antibody is), MFGR1877S or B-701. In some embodiments, thetherapeutic intervention is one or more of MFGR1877S, B-701, FP-1039(GSK230), NVP-BGJ398, JNJ-42756493 (erdafitinib), rogaratinib(BAY1163877), FIIN-2, JNJ-42756493, LY2874455, lenvatinib (E7080),ponatinib (AP24534), regorafenib (BAY 73-4506), dovitinib (TKI258),lucitanib (E3810), cediranib (AZD2171), intedanib (BIBF 1120), brivanib(BMS-540215), ASP5878, AZD4547, BGJ398 (infigratinib), Debio-1347,dovitinib, E7090, HMPL-453, nintedanib (OFEV®, BIBF 1120), MAX-40279,XL999, orantinib (SU6668), pazopanib (VOTRIENT®), anlotinib, and AL3818.Other therapeutic interventions effective for treating a subject havinga genetic biomarker in FGFR3 are known in the art. In some embodiments,a therapeutic intervention administered to the subject having a geneticbiomarker in FGFR3 is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in FGFR3, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in ERBB2, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in ERBB2 is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in ERBB2 is one or more of ananti-ERBB2 antibody, a selective ERBB2 inhibitor, and a pan-ERBBinhibitor. In some embodiments, the therapeutic intervention is acovalent ERBB2 inhibitor. In some embodiments, the therapeuticintervention is a non-covalent ERBB2 inhibitor. In some embodiments, thetherapeutic intervention administered to the subject having a geneticbiomarker in ERBB2 is one or more of AZD8931, AST1306, AEE788, CP724714,CUDC101, TAK285, dacomitinib, pelitinib, AC480, trastuzumab(HERCEPTIN®), pertuzumab (PERJETA®), trastuzumab-dkst (OGIVRI®),DXL-702, E-75, PX-104.1, ZW25, CP-724714, irbinitinib (ARRY-380,ONT-380), TAS0728, lapatinib (TYKERB®, TYVERB®), AST-1306, AEE-788,perlitinib (EKB-569), afatinib (BIBW 2992, GILOTRIF®), neratinib(HKI-272, NERLYNX®, PKI-166, D-69491, HKI-357, AP32788, GW572016,canertinib (CI-1033), AC-480 (BMS-599626), dacomitinib (PF299804,PF299), RB-200h, ARRY-334543 (ARRY-543, ASLAN001), poziotinib(NOV120101), CUDC-101, emodin, IDM-1, ado-trastuzumab emtansine(KADCYLA®), Zemab, DS-8201a, T-DM1, anti-HER2 CAR-T therapy,HER2-Peptid-Vakzine, and HER2Bi-Armed Activated T Cells. Othertherapeutic interventions effective for treating a subject having agenetic biomarker in ERBB2 are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in ERBB2 is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in ERBB2, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in MLL, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in MLL is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in MLL is one or more of cytosinearabinoside, all-trans retinoic acid (ATRA), an HDAC inhibitor (e.g.,valproic acid and HBI-8000), a DNA methyltransferase inhibitor (e.g,decitabine), an LSD1 inhibitor (e.g., ORY1001 (RG6016), ORY1001(RG6016), GSK2879552, GSK2879552, INCB059872, IMG7289, and CC90011),menin 1 inhibitors (e.g., MI1, MI2, MI3, Mi2-2 (MI-2-2), MI463, MI503,MIV-6R), DOLT1 (histone-lysine KMT) inhibitors (e.g, EPZ004777,EPZ-5676, SGC0946, CN-SAH, SYC-522, SAH, and SYC-534), and WDR5-MLLantagonists (e.g., MM-101, MM-102, MM-103, MM-401, WDR5-0101, WDR5-0102,WDR5-0103, and OICR-9429). Other therapeutic interventions effective fortreating a subject having a genetic biomarker in MLL are known in theart. In some embodiments, a therapeutic intervention administered to thesubject having a genetic biomarker in MLL is effective in treating acancer in the subject. For example, after administration of atherapeutic intervention that is effective in treating a subject havinga genetic biomarker in MLL, the number of cancer cells in the subjectcan be reduced, the size of one or more tumors in the subject can bereduced, the rate or extent of metastasis can be reduced, symptomsassociated with the disease or disorder or condition can be wholly orpartly alleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in MET, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in MET is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in MET is a MET inhibitor, an HGFantagonist, an anti-HGF antibody (e.g., rilotumumab (AMG102),ficlatuzumab (AV-299), and TAK701, YYB101), and a multikinase inhibitor(e.g., tivantinib (ARQ 197), golvatinib (E7050), cabozantinib (XL 184,BMS-907351), foretinib (GSK1363089), crizotinib (PF-02341066), MK-2461,BPI-9016M, BPI-9016M, TQ-B3139, MGCD265, and MK-8033). In someembodiments, the MET inhibitor is one or more of capmatinib (INC280,INCB28060), onartuzumab (MetMAb), Savolitinib, tepotinib (MSC2156119J,EMD1214063), CE-35562, AMG-337, AMG-458, Foretinib, PHA-665725, MK-2461,PF-04217903 and SU11274, SU11274 and PHA-665752, SAIT301, HS-10241,ARGX-111, MSC2156119J, glumetinib (SCC244), EMD 1204831, AZD6094(savolitinib, volitinib, HMPL-504), PLB1001, ABT-700, AMG 208,INCB028060, AL2846, and PF-04217903. In some embodiments, thetherapeutic intervention administered to the subject having a geneticbiomarker in MET is one or more of capmatinib (INC280, INCB28060),onartuzumab (MetMAb), Savolitinib, tepotinib (MSC2156119J, EMD1214063),CE-35562, AMG-337, AMG-458, Foretinib, PHA-665725, MK-2461, PF-04217903and SU11274, SU11274 and PHA-665752, SAIT301, HS-10241, ARGX-111,MSC2156119J, glumetinib (SCC244), EMD 1204831, AZD6094 (savolitinib,volitinib, HMPL-504), PLB1001, ABT-700, AMG 208, INCB028060, AL2846,PF-04217903, rilotumumab (AMG102), ficlatuzumab (AV-299), and TAK701,YYB101, tivantinib (ARQ 197), Golvatinib (E7050), Cabozantinib (XL 184,BMS-907351), Foretinib (GSK1363089), Crizotinib (PF-02341066), MK-2461,BPI-9016M, BPI-9016M, TQ-B3139, MGCD265, MK-8033, ABBV-399, HTI-1066,and JNJ-61186372. Other therapeutic interventions effective for treatinga subject having a genetic biomarker in MET are known in the art. Insome embodiments, a therapeutic intervention administered to the subjecthaving a genetic biomarker in MET is effective in treating a cancer inthe subject. For example, after administration of a therapeuticintervention that is effective in treating a subject having a geneticbiomarker in MET, the number of cancer cells in the subject can bereduced, the size of one or more tumors in the subject can be reduced,the rate or extent of metastasis can be reduced, symptoms associatedwith the disease or disorder or condition can be wholly or partlyalleviated, the state of the disease can be stabilized (i.e., notworsened), and/or survival can be prolonged as compared to expectedsurvival if not receiving treatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in VHL, the subject is administered atherapeutic intervention. In some embodiments, a subject identified ashaving a genetic biomarker (e.g., a mutation) in VHL is identified ashaving a cancer (e.g., based on the presence of the genetic biomarker,either alone or in combination with the presence of other geneticbiomarkers and/or the presence of one or more members of other classesof biomarkers and/or the presence of aneuploidy as described herein). Insome embodiments, the therapeutic intervention administered to thesubject having a genetic biomarker in VHL is one or more of anantiangiogenic therapy (e.g., inhibitors of one or more of VEGFR1,VEGFR, VEGFR2, VEGFA, CDH5, EDNRA, ANGPT2, CD34, and ANGPT) vatalanib(PTK787/ZK222584), TKI-538, sunitinib (SU11248, SUTENT®), pazopanib(VOTRIENT®), bevacizumab (AVASTIN®), thalidomide, lenalidomide(REVLIMID®), ranibizumab, EYE001, axitinib (AG013736, INLYTA®), a c-KITinhibitor (e.g., dovitinib (TKI258)), an HDAC inhibitor (e.g.,vorinostat and HBI-8000), a HIF-2alpha inhibitor (e.g., PT2385 andPT2977), a Hsp90 inhibitor (e.g., 17allylamino-17-demethoxygeldanamycin, AUY922, and IPI-504), and growthfactor and receptor inhibitors (e.g, E10030). Other therapeuticinterventions effective for treating a subject having a geneticbiomarker in VHL are known in the art. In some embodiments, atherapeutic intervention administered to the subject having a geneticbiomarker in VHL is effective in treating a cancer in the subject. Forexample, after administration of a therapeutic intervention that iseffective in treating a subject having a genetic biomarker in VHL, thenumber of cancer cells in the subject can be reduced, the size of one ormore tumors in the subject can be reduced, the rate or extent ofmetastasis can be reduced, symptoms associated with the disease ordisorder or condition can be wholly or partly alleviated, the state ofthe disease can be stabilized (i.e., not worsened), and/or survival canbe prolonged as compared to expected survival if not receivingtreatment.

In some embodiments, when a subject is identified as having a geneticbiomarker (e.g., a mutation) in TERT (e.g., in a TERT promoter), thesubject is administered a therapeutic intervention. In some embodiments,a subject identified as having a genetic biomarker (e.g., a mutation) inTERT (e.g., in a TERT promoter) is identified as having a cancer (e.g.,based on the presence of the genetic biomarker, either alone or incombination with the presence of other genetic biomarkers and/or thepresence of one or more members of other classes of biomarkers and/orthe presence of aneuploidy as described herein). In some embodiments,the therapeutic intervention administered to the subject having agenetic biomarker in TERT (e.g., in a TERT promoter) is one or more oferibulin, an hTERT mRNA transfected dendritic cell vaccine (e.g.,AST-VAC1 (hTERT-DC, GRNVAC1)), INO-1400, INO-1401, GX301, dendriticcells transfected with hTERT-, survivin- and tumor cell derived mRNA,and arsenic trioxide. Other therapeutic interventions effective fortreating a subject having a genetic biomarker in TERT (e.g., in a TERTpromoter) are known in the art. In some embodiments, a therapeuticintervention administered to the subject having a genetic biomarker inTERT (e.g., in a TERT promoter) is effective in treating a cancer in thesubject. For example, after administration of a therapeutic interventionthat is effective in treating a subject having a genetic biomarker inTERT (e.g., in a TERT promoter), the number of cancer cells in thesubject can be reduced, the size of one or more tumors in the subjectcan be reduced, the rate or extent of metastasis can be reduced,symptoms associated with the disease or disorder or condition can bewholly or partly alleviated, the state of the disease can be stabilized(i.e., not worsened), and/or survival can be prolonged as compared toexpected survival if not receiving treatment.

In some embodiments, when a subject is identified as being at risk(e.g., increased risk) of developing a disease (e.g., using any of thevariety of methods described herein), the subject is administered atherapeutic intervention. In some embodiments, a subject identified asbeing at risk (e.g., increased risk) of developing a disease (e.g.,using any of the variety of methods described herein) is identified asbeing at risk of developing cancer (e.g., based on the presence of oneor more genetic biomarkers, the presence of one or more proteinbiomarkers, the presence of one or more other biomarkers, and/or thepresence of aneuploidy as described herein). In some embodiments, thetherapeutic intervention administered to the subject identified as beingat risk of developing cancer is a chemopreventive. Non-limiting examplesof chemopreventives include a non-steroidal anti-inflammatory drug(e.g., aspirin, tolfenamic acid, indomethacin, celecoxib, sulindacsulfide, diclofenac, indomethacin, ibuprofen, flurbiprofen, piroxicam,diflunisal, etodolac, ketoprofen, ketorolac, nabumetone, naproxen,oxaprozin, salsalate, and tolmetin), a selective estrogen receptormodulator (e.g., tamoxifen (NOLVADEX®, SOLTAMOX™) and raloxifene(EVISTA®)), instillational BCG, valrubicin, finasteride, dutasteride,curcumin, bisdemethoxycurcumin, metformin, an aromatase inhibitor (e.g.,exemestane), resveratrol, lunasin, vitamin A, isothiocyanate, green tea,luteolin, genistein, lycopene, bitter melon, withaferin A,guggulsterone, selenides, diselenides, crocetin, piperine, a statin (a3-hydroxy-3-methylglutaryl coenzyme A reductase inhibitor), acarotenoid, vitamin A, a retinoid, folic acid, vitamin C, vitamin D,vitamin E, calcium, a flavonoid, and an anti-cancer vaccine. In someembodiments, tamoxifen and/or raloxifene is administered to a subjectidentified as being at risk of developing breast cancer. In someembodiments, instillational BCG and/or valrubicin is administered to asubject identified as being at risk of developing bladder cancer. Insome embodiments, finasteride and/or dutasteride is administered to asubject identified as being at risk of developing prostate cancer. Insome embodiments, celecoxib is administered to a subject identified asbeing at risk of developing colorectal neoplasia.

In some embodiments, the therapeutic intervention can result in an earlyonset of remission of a cancer in a subject. In some embodiments, thetherapeutic intervention can result in an increase in the time ofremission of a cancer in a subject. In some embodiments, the therapeuticintervention can result in an increase in the time of survival of asubject. In some embodiments, the therapeutic intervention can result indecreasing the size of a solid primary tumor in a subject. In someembodiments, the therapeutic intervention can result in decreasing thevolume of a solid primary tumor in a subject. In some embodiments, thetherapeutic intervention can result in decreasing the size of ametastasis in a subject. In some embodiments, the therapeuticintervention can result in decreasing the volume of a metastasis in asubject. In some embodiments, the therapeutic intervention can result indecreasing the tumor burden in a subject.

In some embodiments, the therapeutic intervention can result inimproving the prognosis of a subject. In some embodiments, thetherapeutic intervention can result in decreasing the risk of developinga metastasis in a subject. In some embodiments, the therapeuticintervention can result in decreasing the risk of developing anadditional metastasis in a subject. In some embodiments, the therapeuticintervention can result in decreasing cancer cell migration in asubject. In some embodiments, the therapeutic intervention can result indecreasing cancer cell invasion in a subject. In some embodiments, thetherapeutic intervention can result in a decrease in the time ofhospitalization of a subject. In some embodiments, the therapeuticintervention can result in a decrease of the presence of cancer stemcells within a tumor in a subject.

In some embodiments, the therapeutic intervention can result in anincrease in immune cell infiltration within the tumor microenvironmentin a subject. In some embodiments, the therapeutic intervention canresult in altering the immune cell composition within the tumormicroenvironment of a tumor in a subject. In some embodiments, thetherapeutic intervention can result in modulating a previouslyimmunosuppressive tumor microenvironment into an immunogenic,inflammatory tumor microenvironment. In some embodiments, thetherapeutic intervention can result in a reversal of theimmunosuppressive tumor microenvironment in a subject.

In some embodiments, the therapeutic intervention can halt tumorprogression in a subject. In some embodiments, the therapeuticintervention can delay tumor progression in a subject. In someembodiments, the therapeutic intervention can inhibit tumor progressionin a subject. In some embodiments, the therapeutic intervention caninhibit immune checkpoint pathways of a tumor in a subject. In someembodiments, the therapeutic intervention can immuno-modulate the tumormicroenvironment of a tumor in a subject. In some embodiments, thetherapeutic intervention can immuno-modulate the tumor macroenvironmentof a tumor in a subject.

In some embodiments, a therapeutic intervention can reduce the number ofcancer cells present in a subject. For example, a therapeuticintervention can reduce the number of cancer cells present in a subjectby 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or more. In someembodiments, a therapeutic intervention can reduce the number of cancercells present in a subject such that no cancer cells are observable. Insome embodiments, a therapeutic intervention can reduce the observabletumors present in a subject.

In some embodiments, one or more therapeutic interventions (e.g., achemotherapy or any of the other appropriate therapeutic interventionsdiscloses herein) can be administered to a subject once or multipletimes over a period of time ranging from days to weeks. In someembodiments, one or more therapeutic interventions can be formulatedinto a pharmaceutically acceptable composition for administration to asubject having cancer. For example, a therapeutically effective amountof a therapeutic intervention (e.g. a chemotherapeutic orimmunotherapeutic agent) can be formulated together with one or morepharmaceutically acceptable carriers (additives) and/or diluents. Apharmaceutical composition can be formulated for administration in solidor liquid form including, without limitation, sterile solutions,suspensions, sustained-release formulations, tablets, capsules, pills,powders, and granules.

Pharmaceutically acceptable carriers, fillers, and vehicles that may beused in a pharmaceutical composition described herein include, withoutlimitation, ion exchangers, alumina, aluminum stearate, lecithin, serumproteins, such as human serum albumin, buffer substances such asphosphates, glycine, sorbic acid, potassium sorbate, partial glyceridemixtures of saturated vegetable fatty acids, water, salts orelectrolytes, such as protamine sulfate, disodium hydrogen phosphate,potassium hydrogen phosphate, sodium chloride, zinc salts, colloidalsilica, magnesium trisilicate, polyvinyl pyrrolidone, cellulose-basedsubstances, polyethylene glycol, sodium carboxymethylcellulose,polyacrylates, waxes, polyethylene-polyoxypropylene-block polymers,polyethylene glycol and wool fat.

A pharmaceutical composition containing one or more therapeuticinterventions can be designed for oral or parenteral (includingsubcutaneous, intramuscular, intravenous, and intradermal)administration. When being administered orally, a pharmaceuticalcomposition can be in the form of a pill, tablet, or capsule.Compositions suitable for parenteral administration include aqueous andnon-aqueous sterile injection solutions that can contain anti-oxidants,buffers, bacteriostats, and solutes that render the formulation isotonicwith the blood of the intended recipient. The formulations can bepresented in unit-dose or multi-dose containers, for example, sealedampules and vials, and may be stored in a freeze dried (lyophilized)condition requiring only the addition of the sterile liquid carrier, forexample, water for injections, immediately prior to use. Extemporaneousinjection solutions and suspensions may be prepared from sterilepowders, granules, and tablets.

In some embodiments, a pharmaceutically acceptable composition includingone or more therapeutic interventions can be administered locally orsystemically. For example, a composition provided herein can beadministered locally by injection into tumors. In some embodiments, acomposition provided herein can be administered systemically, orally, orby injection to a subject (e.g., a human).

Effective doses can vary depending on the severity of the cancer, theroute of administration, the age and general health condition of thesubject, excipient usage, the possibility of co-usage with othertherapeutic treatments such as use of other agents, and the judgment ofthe treating physician.

An effective amount of a composition containing one or more therapeuticinterventions can be any amount that reduces the number of cancer cellspresent within the subject without producing significant toxicity to thesubject. If a particular subject fails to respond to a particularamount, then the amount of a therapeutic intervention can be increasedby, for example, two fold. After receiving this higher amount, thesubject can be monitored for both responsiveness to the treatment andtoxicity symptoms, and adjustments made accordingly. The effectiveamount can remain constant or can be adjusted as a sliding scale orvariable dose depending on the subject response to treatment. Variousfactors can influence the actual effective amount used for a particularapplication. For example, the frequency of administration, duration oftreatment, use of multiple treatment agents, route of administration,and severity of the condition (e.g., cancer) may require an increase ordecrease in the actual effective amount administered.

The frequency of administration of one or more therapeutic interventionscan be any amount that reduces the number of cancer cells present withinthe subject without producing significant toxicity to the subject. Forexample, the frequency of administration of one or more therapeuticinterventions can be from about two to about three times a week to abouttwo to about three times a month. The frequency of administration of oneor more therapeutic interventions can remain constant or can be variableduring the duration of treatment. A course of treatment with acomposition containing one or more therapeutic interventions can includerest periods. For example, a composition containing one or moretherapeutic interventions can be administered daily over a two-weekperiod followed by a two week rest period, and such a regimen can berepeated multiple times. As with the effective amount, various factorscan influence the actual frequency of administration used for aparticular application. For example, the effective amount, duration oftreatment, use of multiple treatment agents, route of administration,and severity of the condition (e.g., cancer) may require an increase ordecrease in administration frequency.

An effective duration for administering a composition containing one ormore therapeutic interventions can be any duration that reduces thenumber of cancer cells present within the subject without producingsignificant toxicity to the subject. In some embodiments, the effectiveduration can vary from several days to several weeks. In general, theeffective duration for reducing the number of cancer cells presentwithin the subject can range in duration from about one week to aboutfour weeks. Multiple factors can influence the actual effective durationused for a particular treatment. For example, an effective duration canvary with the frequency of administration, effective amount, use ofmultiple treatment agents, route of administration, and severity of thecondition being treated.

Exemplary Embodiments

In some embodiments, provided herein are methods of detectingbiomarkers, which methods include detecting the presence of one or moremembers of a first class of biomarkers in a sample obtained from thesubject and detecting the presence of one or more members of a secondclass of biomarkers in the sample obtained from the subject. In someembodiments of methods of detecting biomarkers, the methods furtherinclude detecting the presence of aneuploidy in the sample obtained fromthe subject. In some embodiments of methods of detecting biomarkers, thefirst class of biomarkers includes genetic biomarkers. In someembodiments of methods of detecting biomarkers, members of the firstclass of biomarkers are associated with the presence of cancer. In someembodiments of methods of detecting biomarkers, the second class ofbiomarkers includes protein biomarkers. In some embodiments of methodsof detecting biomarkers, members of the second class of biomarkers areassociated with the presence of cancer.

In some embodiments, provided herein are methods of detectingbiomarkers, which methods include: detecting the presence of one or moremembers of a first class of biomarkers in a sample obtained from thesubject; detecting the presence of aneuploidy in the sample obtainedfrom the subject. In some embodiments of methods of detectingbiomarkers, the methods further include detecting the presence of one ormore members of a second class of biomarkers in the sample obtained fromthe subject. In some embodiments of methods of id detecting biomarkers,the first class of biomarkers comprises genetic biomarkers. In someembodiments of methods of detecting biomarkers, the first class ofbiomarkers comprises protein biomarkers. In some embodiments of methodsof detecting biomarkers, members of the first class of biomarkers areassociated with the presence of cancer. In some embodiments of methodsof detecting biomarkers in which the first class of biomarkers comprisesprotein biomarkers, the second class of biomarkers comprises geneticbiomarkers. In some embodiments of methods of detecting biomarkers inwhich the first class of biomarkers comprises protein biomarkers,members of the second class of biomarkers are associated with thepresence of cancer.

In some embodiments, provided herein are methods of identifying asubject as having cancer, which methods include: detecting the presenceof one or more members of a first class of biomarkers in a sampleobtained from the subject; detecting the presence of one or more membersof a second class of biomarkers in the sample obtained from the subject;and identifying the subject as having cancer when the presence of one ormore members of the first class of biomarkers are detected in thesample, the presence of one or more members of the second class ofbiomarkers are detected in the sample, or both. In some embodiments ofmethods of identifying a subject as having cancer, the methods furtherinclude detecting the presence of aneuploidy in the sample obtained fromthe subject; wherein the subject is identified as having cancer when thepresence of one or more members of the first class of biomarkers aredetected in the sample, the presence of one or more members of thesecond class of biomarkers are detected in the sample, the presenceaneuploidy is detected in the sample, or combinations thereof. In someembodiments of methods of identifying a subject as having cancer, thefirst class of biomarkers includes genetic biomarkers. In someembodiments of methods of identifying a subject as having cancer,members of the first class of biomarkers are associated with thepresence of cancer. In some embodiments of methods of identifying asubject as having cancer, the second class of biomarkers includesprotein biomarkers. In some embodiments of methods of identifying asubject as having cancer, members of the second class of biomarkers areassociated with the presence of cancer.

In some embodiments, provided herein are methods of identifying asubject as having cancer, which methods include: detecting the presenceof one or more members of a first class of biomarkers in a sampleobtained from the subject; detecting the presence of aneuploidy in thesample obtained from the subject; and identifying the subject as havingcancer when the presence of one or more members of the first class ofbiomarkers are detected in the sample, the presence of aneuploidy isdetected in the sample, or both. In some embodiments of methods ofidentifying a subject as having cancer, the methods further includedetecting the presence of one or more members of a second class ofbiomarkers in the sample obtained from the subject; wherein the subjectis identified as having cancer when the presence of one or more membersof the first class of biomarkers are detected in the sample, thepresence of one or more members of the second class of biomarkers aredetected in the sample, the presence aneuploidy is detected in thesample, or combinations thereof. In some embodiments of methods ofidentifying a subject as having cancer, the first class of biomarkerscomprises genetic biomarkers. In some embodiments of methods ofidentifying a subject as having cancer, the first class of biomarkerscomprises protein biomarkers. In some embodiments of methods ofidentifying a subject as having cancer, members of the first class ofbiomarkers are associated with the presence of cancer. In someembodiments of methods of identifying a subject as having cancer inwhich the first class of biomarkers comprises protein biomarkers, thesecond class of biomarkers comprises genetic biomarkers. In someembodiments of methods of identifying a subject as having cancer inwhich the first class of biomarkers comprises protein biomarkers,members of the second class of biomarkers are associated with thepresence of cancer.

In some embodiments of methods of identifying a subject as having cancerthat include detecting the presence of one or more members of a firstclass of biomarkers in a sample obtained from the subject and detectingthe presence of one or more members of a second class of biomarkers inthe sample obtained from the subject, the sensitivity of detecting thepresence of cancer is increased as compared to methods which includedetecting the presence of one or more members of only a single class ofbiomarkers. In some embodiments of methods of identifying a subject ashaving cancer that include detecting the presence of one or more membersof a first class of biomarkers in a sample obtained from the subject anddetecting the presence of one or more members of a second class ofbiomarkers in the sample obtained from the subject, the specificity ofdetecting the presence of cancer is increased as compared to methodswhich include detecting the presence of one or more members of only asingle class of biomarkers. In some embodiments of methods ofidentifying a subject as having cancer that include detecting thepresence of one or more members of a first class of biomarkers in asample obtained from the subject and detecting the presence ofaneuploidy in the sample obtained from the subject, the sensitivity ofdetecting the presence of cancer is increased as compared to methodswhich include detecting only one or more members of the class ofbiomarkers or only the presence of aneuploidy. In some embodiments ofmethods of identifying a subject as having cancer that include detectingthe presence of one or more members of a first class of biomarkers in asample obtained from the subject and detecting the presence ofaneuploidy in the sample obtained from the subject, the specificity ofdetecting the presence of cancer is increased as compared to methods inwhich include detecting only one or more members of the class ofbiomarkers or only the presence of aneuploidy.

In some embodiments, provided herein are methods of treating a subjectidentified as having cancer, which methods include: detecting thepresence of one or more members of a first class of biomarkers in asample obtained from the subject; detecting the presence of one or moremembers of a second class of biomarkers in the sample obtained from thesubject; identifying the subject as having cancer when the presence ofone or more members of the first class of biomarkers are detected in thesample, the presence of one or more members of the second class ofbiomarkers are detected in the sample, or both; and administering to thesubject a therapeutic intervention. In some embodiments of methods oftreating a subject identified as having cancer, the methods furtherinclude detecting the presence of aneuploidy in the sample obtained fromthe subject; wherein the subject is identified as having cancer when thepresence of one or more members of the first class of biomarkers aredetected in the sample, the presence of one or more members of thesecond class of biomarkers are detected in the sample, the presenceaneuploidy is detected in the sample, or combinations thereof. In someembodiments of methods of treating a subject identified as havingcancer, the first class of biomarkers includes genetic biomarkers. Insome embodiments of methods of treating a subject identified as havingcancer, members of the first class of biomarkers are associated with thepresence of cancer. In some embodiments of methods of treating a subjectidentified as having cancer, the second class of biomarkers includesprotein biomarkers. In some embodiments of methods of treating a subjectidentified as having cancer, members of the second class of biomarkersare associated with the presence of cancer.

In some embodiments, provided herein are methods of treating a subjectidentified as having cancer, which methods include: detecting thepresence of one or more members of a first class of biomarkers in asample obtained from the subject; detecting the presence of aneuploidyin the sample obtained from the subject; identifying the subject ashaving cancer when the presence of one or more members of the firstclass of biomarkers are detected in the sample, the presence ofaneuploidy is detected in the sample, or both; and administering to thesubject a therapeutic intervention. In some embodiments of methods oftreating a subject identified as having cancer, the methods furtherinclude detecting the presence of one or more members of a second classof biomarkers in the sample obtained from the subject; wherein thesubject is identified as having cancer when the presence of one or moremembers of the first class of biomarkers are detected in the sample, thepresence of one or more members of the second class of biomarkers aredetected in the sample, the presence aneuploidy is detected in thesample, or combinations thereof. In some embodiments of methods oftreating a subject identified as having cancer, the first class ofbiomarkers comprises genetic biomarkers. In some embodiments of methodsof treating a subject identified as having cancer, the first class ofbiomarkers comprises protein biomarkers. In some embodiments of methodsof treating a subject identified as having cancer, members of the firstclass of biomarkers are associated with the presence of cancer. In someembodiments of methods of treating a subject identified as having cancerin which the first class of biomarkers comprises protein biomarkers, thesecond class of biomarkers comprises genetic biomarkers. In someembodiments of methods of treating a subject identified as having cancerin which the first class of biomarkers comprises protein biomarkers,members of the second class of biomarkers are associated with thepresence of cancer.

In some embodiments of methods of treating a subject identified ashaving cancer that include detecting the presence of one or more membersof a first class of biomarkers in a sample obtained from the subject anddetecting the presence of one or more members of a second class ofbiomarkers in the sample obtained from the subject, the sensitivity ofdetecting the presence of cancer is increased as compared to methodswhich include detecting the presence of one or more members of only asingle class of biomarkers. In some embodiments of methods of treating asubject identified as having cancer that include detecting the presenceof one or more members of a first class of biomarkers in a sampleobtained from the subject and detecting the presence of one or moremembers of a second class of biomarkers in the sample obtained from thesubject, the specificity of detecting the presence of cancer isincreased as compared to methods which include detecting the presence ofone or more members of only a single class of biomarkers. In someembodiments of methods of treating a subject identified as having cancerthat include detecting the presence of one or more members of a firstclass of biomarkers in a sample obtained from the subject and detectingthe presence of aneuploidy in the sample obtained from the subject, thesensitivity of detecting the presence of cancer is increased as comparedto methods which include detecting only one or more members of the classof biomarkers or only the presence of aneuploidy. In some embodiments ofmethods of treating a subject identified as having cancer that includedetecting the presence of one or more members of a first class ofbiomarkers in a sample obtained from the subject and detecting thepresence of aneuploidy in the sample obtained from the subject, thespecificity of detecting the presence of cancer is increased as comparedto methods in which include detecting only one or more members of theclass of biomarkers or only the presence of aneuploidy.

In some embodiments of methods of treating a subject identified ashaving cancer, any of the variety of therapeutic interventions describedherein (e.g., surgery, chemotherapy, hormone therapy, targeted therapy,radiation therapy, and combinations thereof) can be administered to thesubject.

In some embodiments, provided herein are methods of identifying asubject as having cancer that include: detecting the presence of one ormore genetic biomarkers in circulating DNA in a blood sample obtainedfrom said subject; detecting the presence of an elevated level of one ormore peptide biomarkers in the blood sample obtained from a subject; andidentifying the subject as having cancer when the presence of one ormore genetic biomarkers is detected in circulating DNA in said bloodsample, when an elevated level of one or more peptide biomarkers isdetected in said blood sample, or both. In some embodiments, providedherein are methods of treating a subject having cancer that include:detecting the presence of one or more genetic biomarkers in circulatingDNA in a blood sample obtained from said subject; detecting the presenceof an elevated level of one or more peptide biomarkers in the bloodsample obtained from a subject; and administering one or moretherapeutic interventions (e.g., one or more of surgery, chemotherapy,hormone therapy, targeted therapy, and radiation therapy) to saidsubject when the presence of one or more genetic biomarkers is detectedin circulating DNA in said blood sample, when an elevated level of oneor more peptide biomarkers is detected in said blood sample, or both. Insome embodiments, provided herein are methods of identifying thelocation of a cancer in a subject, said method comprising: detecting thepresence of one or more genetic biomarkers in circulating DNA in a bloodsample obtained from said subject; detecting the presence of an elevatedlevel of one or more peptide biomarkers in the blood sample obtainedfrom a subject; and identifying the location of the cancer in thesubject when the presence of one or more genetic biomarkers is detectedin circulating DNA in said blood sample, when an elevated level of oneor more peptide biomarkers is detected in said blood sample, or both.

In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject (e.g., based on the presence of one or more genetic biomarkersin circulating DNA in a blood sample obtained from said subject and/orpresence of an elevated level of one or more peptide biomarkers in theblood sample obtained from a subject), the subject is a human. In someembodiments of identifying a subject as having cancer, treating asubject having cancer, or identifying the location of a cancer in asubject, said blood sample is a plasma sample. In some embodiments ofidentifying a subject as having cancer, treating a subject havingcancer, or identifying the location of a cancer in a subject, the canceris a Stage I cancer. In some embodiments of identifying a subject ashaving cancer, treating a subject having cancer, or identifying thelocation of a cancer in a subject, the one or more genetic biomarkerscomprise one or more modifications in one or more genes. In someembodiments of identifying a subject as having cancer, treating asubject having cancer, or identifying the location of a cancer in asubject, the one or more modifications comprise inactivatingmodifications, and wherein said one or more genes comprise tumorsuppressor genes. In some embodiments of identifying a subject as havingcancer, treating a subject having cancer, or identifying the location ofa cancer in a subject, the one or more modifications include amodification independently selected from single base substitutions,insertions, or deletions, translocations, fusions, breaks, duplications,or amplifications.

In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject (e.g., based on the presence of one or more genetic biomarkersin circulating DNA in a blood sample obtained from said subject and/orpresence of an elevated level of one or more peptide biomarkers in theblood sample obtained from a subject), the cancer is liver cancer, ovarycancer, esophageal cancer, stomach cancer, pancreatic cancer, colorectalcancer, lung cancer, breast cancer, or prostate cancer. In someembodiments of identifying a subject as having cancer, treating asubject having cancer, or identifying the location of a cancer in asubject in which the cancer is liver cancer, ovary cancer, esophagealcancer, stomach cancer, pancreatic cancer, colorectal cancer, lungcancer, breast cancer, or prostate cancer and in which the one or moregenetic biomarkers are genes, the one or more genes are one or more of:NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS,KRAS, AKT1, TP53, PPP2R1A, or GNAS. In some embodiments of identifying asubject as having cancer, treating a subject having cancer, oridentifying the location of a cancer in a subject in which the one ormore genes one or more of: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF,CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, or GNAS, the oneor more protein biomarkers are one or more of CA19-9, CEA, HGF, OPN,CA125, prolactin, TIMP-1, or myeloperoxidase (MPO). In some embodimentsof identifying a subject as having cancer, treating a subject havingcancer, or identifying the location of a cancer in a subject in whichthe one or more genes one or more of: NRAS, CTNNB1, PIK3CA, FBXW7, APC,EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, orGNAS, the one or more protein biomarkers are one or more of CA19-9, CEA,HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin, G-CSF, or CA15-3.In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject in which the one or more genes one or more of: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, or GNAS, the one or more protein biomarkers are one ormore of CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, or CA15-3.In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject in which the one or more genes one or more of: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, or GNAS, the one or more modifications are independentlyselected from the modifications set forth in Table 3.

In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject (e.g., based on the presence of one or more genetic biomarkersin circulating DNA in a blood sample obtained from said subject and/orpresence of an elevated level of one or more peptide biomarkers in theblood sample obtained from a subject), the cancer is pancreatic cancer.In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject in which the cancer is pancreatic cancer and in which the one ormore genetic biomarkers are genes, the one or more genes are one or moreof KRAS, TP53, CDKN2A, or SMAD4. In some embodiments of identifying asubject as having cancer, treating a subject having cancer, oridentifying the location of a cancer in a subject in which the one ormore genes one or more of: KRAS, TP53, CDKN2A, or SMAD4, the one or moreprotein biomarkers are one or more of CA19-9, CEA, HGF, or OPN.

In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject (e.g., based on the presence of one or more genetic biomarkersin circulating DNA in a blood sample obtained from said subject and/orpresence of an elevated level of one or more peptide biomarkers in theblood sample obtained from a subject), the step of detecting thepresence of one or more genetic biomarkers is performed using a methodthat includes a PCR-based multiplex assay, using a PCR-based singleplexassay, a digital PCR assay, a droplet digital PCR (ddPCR) assay, amicroarray assay, a next-generation sequencing assay, a Sangersequencing assay, a quantitative PCR assay, or a ligation assay. In someembodiments, the multiplex PCR-based sequencing assay includes: a.assigning a unique identifier (UID) to each of a plurality of templatemolecules present in the sample; b. amplifying each uniquely taggedtemplate molecule to create UID-families; and c. redundantly sequencingthe amplification products.

In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject (e.g., based on the presence of one or more genetic biomarkersin circulating DNA in a blood sample obtained from said subject and/orpresence of an elevated level of one or more peptide biomarkers in theblood sample obtained from a subject), the step of elevated level of oneor more protein biomarkers is performed using a multiplex immunoassaysystem.

In some embodiments of identifying a subject as having cancer, themethods further include identifying a location of the cancer. In someembodiments of identifying a subject as having cancer, the methodsfurther include administering to the subject one or more therapeuticinterventions. In some embodiments, the one or more therapeuticinterventions are one or more of: surgery, chemotherapy, hormonetherapy, targeted therapy, or radiation therapy.

In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject (e.g., based on the presence of one or more genetic biomarkersin circulating DNA in a blood sample obtained from said subject and/orpresence of an elevated level of one or more peptide biomarkers in theblood sample obtained from a subject), the methods further include: a)determining a mutation allele frequency in the blood sample for two ormore of the genetic biomarkers; b) obtaining a score that indicates thelikelihood that the subject has a cancer by comparing the mutationallele frequency of each mutation in the selected genetic biomarkers toa first reference distribution of mutation allele frequency in controlsamples and a second reference distribution of mutation allele frequencyin samples collected from subjects having a cancer; and c) identifyingthe subject as having a cancer when the score is higher than a referencevalue for the score. In some embodiments, obtaining the score includescalculating the ratio of the probability of the mutation allelefrequency in the first reference distribution to the probability of themutation allele frequency in the second reference distribution for eachmutation in the selected genetic biomarkers. In some embodiments, thescore is determined by calculating the weighted average of the log ratioof the probability of the mutation allele frequency in the firstreference distribution to the probability of the mutation allelefrequency in the second reference distribution for each mutation in theselected genetic biomarkers. In some embodiments, detecting the presenceof one or more genetic biomarkers is performed in two or more testsamples using an assay comprising: a. assigning a unique identifier(UID) to each of a plurality of template molecules present in thesample; b. amplifying each uniquely tagged template molecule to createUID-families; and c. redundantly sequencing the amplification products.In some embodiments, the score is calculated by the following formula:

${\Omega = {\sum\limits_{i = 1}{w_{i}*\ln \frac{p_{i}^{C}}{p_{i}^{N}}}}},$

wherein w_(i) is the number of unique identifier sequences (UIDs) in atest sample i divided by the total number of UIDs for that mutation inall test samples, pi^(N) is the probability of the mutation allelefrequency in the first reference distribution, and pi^(C) is theprobability of the mutation allele frequency in the second referencedistribution.

In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject (e.g., based on the presence of one or more genetic biomarkersin circulating DNA in a blood sample obtained from said subject and/orpresence of an elevated level of one or more peptide biomarkers in theblood sample obtained from a subject), the sensitivity of identifying asubject as having cancer is increased as compared to: 1) the sensitivityobtained when the presence of one or more members of only a single classof biomarkers in the sample obtained from the subject is detected. Insome embodiments of identifying a subject as having cancer, treating asubject having cancer, or identifying the location of a cancer in asubject, the specificity of identifying a subject as having cancer isincreased as compared to: 1) the specificity obtained when the presenceof one or more members of only a single class of biomarkers in thesample obtained from the subject is detected.

In some embodiments of identifying a subject as having cancer, treatinga subject having cancer, or identifying the location of a cancer in asubject (e.g., based on the presence of one or more genetic biomarkersin circulating DNA in a blood sample obtained from said subject and/orpresence of an elevated level of one or more peptide biomarkers in theblood sample obtained from a subject), the methods further include:detecting the presence of aneuploidy in the sample obtained from thesubject, wherein the subject is identified as having cancer when thepresence of one or more members of the first class of biomarkers aredetected in the sample, the presence of one or more members of thesecond class of biomarkers are detected in the sample, the presenceaneuploidy is detected in the sample, or combinations thereof.

In some embodiments, provided herein are method of identifying a patientas having a cancer that include: a) determining a mutation allelefrequency in a sample collected from the patient for each mutation inone or more of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC,EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, orGNAS; b) obtaining a score that indicates the likelihood that thepatient has a cancer by comparing the mutation allele frequency of eachmutation in the selected genes to a first reference distribution ofmutation allele frequency in control samples and a second referencedistribution of mutation allele frequency in samples collected frompatients having a cancer; and c) identifying the patient as having acancer when the score is higher than a reference value for the score ishigher than a reference value for the score. In some embodiments, suchmethods further include measuring the concentration of one or more ofthe following proteins: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1,or myeloperoxidase (MPO) and determining that the concentration of atleast one protein is higher than a reference value. In some embodiments,such methods further include measuring the concentration of one or moreof the following proteins: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin,TIMP-1, follistatin, G-CSF, or CA15-3 and determining that theconcentration of at least one protein is higher than a reference value.In some embodiments, such methods further include measuring theconcentration of one or more of the following proteins: CA19-9, CEA,HGF, OPN, CA125, AFP, prolactin, TIMP-1, or CA15-3 and determining thatthe concentration of at least one protein is higher than a referencevalue. In some embodiments, obtaining the score comprises calculatingthe ratio of the probability of the mutation allele frequency in thefirst reference distribution to the probability of the mutation allelefrequency in the second reference distribution for each mutation in theselected genes. In some embodiments, the score is determined bycalculating the weighted average of the log ratio of the probability ofthe mutation allele frequency in the first reference distribution to theprobability of the mutation allele frequency in the second referencedistribution for each mutation in the selected genes. In someembodiments, the sample is assayed in two or more test samples using amethod that includes: a. assigning a unique identifier (UID) to each ofa plurality of template molecules present in the sample; b. amplifyingeach uniquely tagged template molecule to create UID-families; and c.redundantly sequencing the amplification products. In some embodiments,the score is calculated by the following formula:

${\Omega = {\sum\limits_{i = 1}{w_{i}*\ln \frac{p_{i}^{C}}{p_{i}^{N}}}}},$

wherein w_(i) is the number of unique identifier sequences (UIDs) in atest sample i divided by the total number of UIDs for that mutation inall test samples, pi^(N) is the probability of the mutation allelefrequency in the first reference distribution, and pi^(C) is theprobability of the mutation allele frequency in the second referencedistribution. In some embodiments, the blood sample is a plasma sample.In some embodiments, the cancer is a Stage I cancer. In someembodiments, the cancer is liver cancer, ovary cancer, esophagealcancer, stomach cancer, pancreatic cancer, colorectal cancer, lungcancer, breast cancer, or prostate cancer. In some embodiments, the atleast one mutation comprises an inactivating modifications, and whereinthe at least one mutation is in a tumor suppressor gene. In someembodiments, the at least one mutation comprises a mutation that is asingle base substitution, an insertion, or a deletion. In someembodiments, the at least one mutation is a mutation set forth in Table3. In some embodiments, the step of detecting the level of one or moreproteins is performed using a multiplex immunoassay system. In someembodiments, the methods further include identifying a location of thecancer. In some embodiments, the methods further include comprisingadministering to the mammal one or more therapeutic interventions. Insome embodiments, the one or more therapeutic interventions are one ormore of: surgery, chemotherapy, hormone therapy, targeted therapy, orradiation therapy.

In some embodiments, provided herein are method of identifying a patientas having a cancer that include: a) determining a mutation allelefrequency in a sample collected from the patient for each mutation inone or more of the following genes: KRAS, TP53, CDKN2A, or SMAD4; b)obtaining a score that indicates the likelihood that the patient has acancer by comparing the mutation allele frequency of each mutation inthe selected genes to a first reference distribution of mutation allelefrequency in control samples and a second reference distribution ofmutation allele frequency in samples collected from patients having acancer; and c) identifying the patient as having a cancer when the scoreis higher than a reference value for the score is higher than areference value for the score. In some embodiments, such methods furtherinclude measuring the concentration of one or more of the followingproteins: CA19-9, CEA, HGF, or OPN and determining that theconcentration of at least one protein is higher than a reference value.In some embodiments, obtaining the score comprises calculating the ratioof the probability of the mutation allele frequency in the firstreference distribution to the probability of the mutation allelefrequency in the second reference distribution for each mutation in theselected genes. In some embodiments, the score is determined bycalculating the weighted average of the log ratio of the probability ofthe mutation allele frequency in the first reference distribution to theprobability of the mutation allele frequency in the second referencedistribution for each mutation in the selected genes. In someembodiments, the sample is assayed in two or more test samples using amethod that includes: a. assigning a unique identifier (UID) to each ofa plurality of template molecules present in the sample; b. amplifyingeach uniquely tagged template molecule to create UID-families; and c.redundantly sequencing the amplification products. In some embodiments,the score is calculated by the following formula:

${\Omega = {\sum\limits_{i = 1}{w_{i}*\ln \frac{p_{i}^{C}}{p_{i}^{N}}}}},$

wherein w_(i) is the number of unique identifier sequences (UIDs) in atest sample i divided by the total number of UIDs for that mutation inall test samples, pi^(N) is the probability of the mutation allelefrequency in the first reference distribution, and pi^(C) is theprobability of the mutation allele frequency in the second referencedistribution. In some embodiments, the blood sample is a plasma sample.In some embodiments, the cancer is a Stage I cancer. In someembodiments, the cancer is liver cancer, ovary cancer, esophagealcancer, stomach cancer, pancreatic cancer, colorectal cancer, lungcancer, breast cancer, or prostate cancer. In some embodiments, the atleast one mutation comprises an inactivating modifications, and whereinthe at least one mutation is in a tumor suppressor gene. In someembodiments, the at least one mutation comprises a mutation that is asingle base substitution, an insertion, or a deletion. In someembodiments, the step of detecting the level of one or more proteins isperformed using a multiplex immunoassay system. In some embodiments, themethods further include identifying a location of the cancer. In someembodiments, the methods further include comprising administering to themammal one or more therapeutic interventions. In some embodiments, theone or more therapeutic interventions are one or more of: surgery,chemotherapy, hormone therapy, targeted therapy, or radiation therapy.

In some embodiments, provided herein are systems for generating a reportfor a patient that include: at least one device configured to assay aset of genes in a biological sample to determine mutation allelefrequency of each mutation in the set of genes in a sample collectedfrom the patient, wherein the set of genes comprises one or more (e.g.,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, or 16) of thefollowing genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A,PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS; b) at leastone computer database including: i) a first reference distribution ofmutation allele frequency in control samples for each mutation in theset of genes; and ii) a second reference distribution of mutation allelefrequency in samples collected from patients having a cancer for eachmutation in the set of genes; c) a computer-readable program codecomprising instructions to execute the following: i) inputting themutation allele frequency in the biological sample of each mutation inthe set of genes; ii) comparing the mutation allele frequency to thefirst reference distribution; iii) comparing the mutation allelefrequency to the second reference distribution; and iv) calculating ascore that indicates the likelihood that the patient has a cancer; d) acomputer-readable program code comprising instructions to generate areport that indicates the patient as having a cancer if the score ishigher than a reference value; or a report that indicates the patient asnot having a cancer if the score is not higher than the reference value.

In some embodiments, provided herein are systems for generating a reportfor a patient that include: at least one device configured to assay aset of genes in a biological sample to determine mutation allelefrequency of each mutation in the set of genes in a sample collectedfrom the patient, wherein the set of genes comprises one or more (e.g.,1, 2, 3, or 4) of the following genes: KRAS, TP53, CDKN2A, and/or SMAD4;b) at least one computer database including: i) a first referencedistribution of mutation allele frequency in control samples for eachmutation in the set of genes; and ii) a second reference distribution ofmutation allele frequency in samples collected from patients having acancer for each mutation in the set of genes; c) a computer-readableprogram code comprising instructions to execute the following: i)inputting the mutation allele frequency in the biological sample of eachmutation in the set of genes; ii) comparing the mutation allelefrequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value.

In some embodiments, provided herein are systems for generating a reportto identify a cancer treatment for a patient that include: a) at leastone device configured to assay a set of genes in a biological sample todetermine mutation allele frequency of each mutation in the set of genesin a sample collected from the patient, wherein the set of genescomprises one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 16, or 16) of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC,EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/orGNAS; b) at least one computer database comprising: i) a first referencedistribution of mutation allele frequency in control samples for eachmutation in the set of genes; ii) a second reference distribution ofmutation allele frequency in samples collected from patients having acancer for each mutation in the set of genes; and iii) a listing ofcancer treatment with efficacy linked to a biological state of at leastone member of the set of genes; c) a computer-readable program codecomprising instructions to execute the following: i) inputting themutation allele frequency in the biological sample of each mutation inthe set of genes; ii) comparing the mutation allele frequency to thefirst reference distribution; iii) comparing the mutation allelefrequency to the second reference distribution; and iv) calculating ascore that indicates the likelihood that the patient has a cancer; d) acomputer-readable program code comprising instructions to generate areport that indicates the patient as having a cancer if the score ishigher than a reference value; or a report that indicates the patient asnot having a cancer if the score is not higher than the reference value;and e) a computer-readable program code comprising instructions toidentify for the patient at least one cancer treatment from the listingof cancer treatments in (b)(iii) when the patient is identified ashaving a cancer in step (d), wherein the calculated score provides anindication that the identified cancer treatment will be effective in thepatient.

In some embodiments, provided herein are systems for generating a reportto identify a cancer treatment for a patient that include: a) at leastone device configured to assay a set of genes in a biological sample todetermine mutation allele frequency of each mutation in the set of genesin a sample collected from the patient, wherein the set of genescomprises one or more (e.g., 1, 2, 3, or 4) of the following genes:KRAS, TP53, CDKN2A, and/or SMAD4; b) at least one computer databasecomprising: i) a first reference distribution of mutation allelefrequency in control samples for each mutation in the set of genes; ii)a second reference distribution of mutation allele frequency in samplescollected from patients having a cancer for each mutation in the set ofgenes; and iii) a listing of cancer treatment with efficacy linked to abiological state of at least one member of the set of genes; c) acomputer-readable program code comprising instructions to execute thefollowing: i) inputting the mutation allele frequency in the biologicalsample of each mutation in the set of genes; ii) comparing the mutationallele frequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value; and e) a computer-readable program code comprisinginstructions to identify for the patient at least one cancer treatmentfrom the listing of cancer treatments in (b)(iii) when the patient isidentified as having a cancer in step (d), wherein the calculated scoreprovides an indication that the identified cancer treatment will beeffective in the patient.

In some embodiments, provided herein are systems for generating a reportfor a patient to identify the patient as a candidate for increasedmonitoring, further diagnostic testing, or both that include: a) atleast one device configured to assay a set of genes in a biologicalsample to determine mutation allele frequency of each mutation in theset of genes in a sample collected from the patient, wherein the set ofgenes comprises one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 16, or 16) of the following genes: NRAS, CTNNB1, PIK3CA,FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53,PPP2R1A, and/or GNAS; b) at least one computer database comprising: i) afirst reference distribution of mutation allele frequency in controlsamples for each mutation in the set of genes; ii) a second referencedistribution of mutation allele frequency in samples collected frompatients having a cancer for each mutation in the set of genes; and iii)a listing of cancer treatment with efficacy linked to a biological stateof at least one member of the set of genes; c) a computer-readableprogram code comprising instructions to execute the following: i)inputting the mutation allele frequency in the biological sample of eachmutation in the set of genes; ii) comparing the mutation allelefrequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value; and e) a computer-readable program code comprisinginstructions to identify for the patient at least one cancer treatmentfrom the listing of cancer treatments in (b)(iii) when the patient isidentified as having a cancer in step (d), wherein the calculated scoreprovides an indication that the identified cancer treatment will beeffective in the patient.

In some embodiments, provided herein are systems for generating a reportfor a patient to identify the patient as a candidate for increasedmonitoring, further diagnostic testing, or both that include: a) atleast one device configured to assay a set of genes in a biologicalsample to determine mutation allele frequency of each mutation in theset of genes in a sample collected from the patient, wherein the set ofgenes comprises one or more (e.g., 1, 2, 3, or 4) of the followinggenes: KRAS, TP53, CDKN2A, and/or SMAD4; b) at least one computerdatabase comprising: i) a first reference distribution of mutationallele frequency in control samples for each mutation in the set ofgenes; ii) a second reference distribution of mutation allele frequencyin samples collected from patients having a cancer for each mutation inthe set of genes; and iii) a listing of cancer treatment with efficacylinked to a biological state of at least one member of the set of genes;c) a computer-readable program code comprising instructions to executethe following: i) inputting the mutation allele frequency in thebiological sample of each mutation in the set of genes; ii) comparingthe mutation allele frequency to the first reference distribution; iii)comparing the mutation allele frequency to the second referencedistribution; and iv) calculating a score that indicates the likelihoodthat the patient has a cancer; d) a computer-readable program codecomprising instructions to generate a report that indicates the patientas having a cancer if the score is higher than a reference value; or areport that indicates the patient as not having a cancer if the score isnot higher than the reference value; and e) a computer-readable programcode comprising instructions to identify for the patient at least onecancer treatment from the listing of cancer treatments in (b)(iii) whenthe patient is identified as having a cancer in step (d), wherein thecalculated score provides an indication that the identified cancertreatment will be effective in the patient.

In some embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both, calculating the score comprises calculating the ratioof the probability of the mutation allele frequency in the firstreference distribution to the probability of the mutation allelefrequency in the second reference distribution for each mutation in theselected genes. In some embodiments of systems for generating a reportfor a patient, systems for generating a report to identify a cancertreatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both, the score is calculated bycalculating the weighted average of the log ratio of the probability ofthe mutation allele frequency in the first reference distribution to theprobability of the mutation allele frequency in the second referencedistribution for each mutation in the selected genes. In someembodiments of systems for generating a report for a patient, systemsfor generating a report to identify a cancer treatment for a patient, orsystems for generating a report for a patient to identify the patient asa candidate for increased monitoring, further diagnostic testing, orboth, the computer database comprises test sample data, the test sampledata includes assignment of a unique identifier (UID) to each of aplurality of template molecules present in the sample. In someembodiments of systems for generating a report for a patient, systemsfor generating a report to identify a cancer treatment for a patient, orsystems for generating a report for a patient to identify the patient asa candidate for increased monitoring, further diagnostic testing, orboth, the score is calculated by the following formula:

${\Omega = {\sum\limits_{i = 1}{w_{i}*\ln \frac{p_{i}^{C}}{p_{i}^{N}}}}},$

wherein w_(i) is the number of unique identifier sequences (UIDs) in atest sample i divided by the total number of UIDs for that mutation inall test samples, pi^(N) is the probability of the mutation allelefrequency in the first reference distribution, and pi^(C) is theprobability of the mutation allele frequency in the second referencedistribution. In some embodiments of systems for generating a report fora patient, systems for generating a report to identify a cancertreatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both, the device configured to assay aset of genes comprises a device that employs a includes a PCR-basedmultiplex assay, using a PCR-based singleplex assay, a digital PCRassay, a droplet digital PCR (ddPCR) assay, a microarray assay, anext-generation sequencing assay, a Sanger sequencing assay, aquantitative PCR assay, or a ligation assay. In some embodiments ofsystems for generating a report for a patient, systems for generating areport to identify a cancer treatment for a patient, or systems forgenerating a report for a patient to identify the patient as a candidatefor increased monitoring, further diagnostic testing, or both, themultiplex PCR-based sequencing assay comprises: a. assigning a uniqueidentifier (UID) to each of a plurality of template molecules present inthe sample; b. amplifying each uniquely tagged template molecule tocreate UID-families; and c. redundantly sequencing the amplificationproducts. In some embodiments of systems for generating a report for apatient, systems for generating a report to identify a cancer treatmentfor a patient, or systems for generating a report for a patient toidentify the patient as a candidate for increased monitoring, furtherdiagnostic testing, or both, the system further includes a device (e.g.,a device that includes a multiplex immunoassay system) configured todetect a level of one or more protein biomarkers in a biological sample,wherein the protein biomarkers are one or more (e.g., 1, 2, 3, 4, 5, 6,7, or 8) of: CA19-9, CEA, HGF, OPN, CA125, prolactin, and/ormyeloperoxidase (MPO). In some embodiments of systems for generating areport for a patient, systems for generating a report to identify acancer treatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both, the system further includes adevice (e.g., a device that includes a multiplex immunoassay system)configured to detect a level of one or more protein biomarkers in abiological sample, wherein the protein biomarkers are one or more (e.g.,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11) of: CA19-9, CEA, HGF, OPN, CA125,AFP, prolactin, follistatin, G-CSF, and/or CA15-3. In some embodimentsof systems for generating a report for a patient, systems for generatinga report to identify a cancer treatment for a patient, or systems forgenerating a report for a patient to identify the patient as a candidatefor increased monitoring, further diagnostic testing, or both, thesystem further includes a device (e.g., a device that includes amultiplex immunoassay system) configured to detect a level of one ormore protein biomarkers in a biological sample, wherein the proteinbiomarkers are one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, or 9) of:CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, and/or CA15-3. Insome embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both that include at least one device configured to assayone or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, or16) of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR,BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS,the system further includes a device (e.g., a device that includes amultiplex immunoassay system) configured to detect a level of one ormore protein biomarkers in a biological sample, wherein the proteinbiomarkers are one or more (e.g., 1, 2, 3, 4, 5, 6, 7, or 8) of: CA19-9,CEA, HGF, OPN, CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO).In some embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both that include at least one device configured to assayone or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, or16) of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR,BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS,the system further includes a device (e.g., a device that includes amultiplex immunoassay system) configured to detect a level of one ormore protein biomarkers in a biological sample, wherein the proteinbiomarkers are one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or 11)of: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1, follistatin,G-CSF, and/or CA15-3. In some embodiments of systems for generating areport for a patient, systems for generating a report to identify acancer treatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both that include at least one deviceconfigured to assay one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 16, or 16) of the following genes: NRAS, CTNNB1, PIK3CA,FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53,PPP2R1A, and/or GNAS, the system further includes a device (e.g., adevice that includes a multiplex immunoassay system) configured todetect a level of one or more protein biomarkers in a biological sample,wherein the protein biomarkers are one or more (e.g., 1, 2, 3, 4, 5, 6,7, 8, or 9) of: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1,and/or CA15-3. In some embodiments of systems for generating a reportfor a patient, systems for generating a report to identify a cancertreatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both, the system further includes adevice (e.g., a device that includes a multiplex immunoassay system)configured to detect a level of one or more protein biomarkers in abiological sample, wherein the protein biomarkers are one or more (e.g.,1, 2, 3, or 4) of: CA19-9, CEA, HGF, and/or OPN. In some embodiments ofsystems for generating a report for a patient, systems for generating areport to identify a cancer treatment for a patient, or systems forgenerating a report for a patient to identify the patient as a candidatefor increased monitoring, further diagnostic testing, or both thatinclude at least one device configured to assay one or more (e.g., 1, 2,3, or 4) of the following genes: KRAS, TP53, CDKN2A, and/or SMAD4, thesystem further includes a device (e.g., a device that includes amultiplex immunoassay system) configured to detect a level of one ormore protein biomarkers in a biological sample, wherein the proteinbiomarkers are one or more (e.g., 1, 2, 3, or 4) of: CA19-9, CEA, HGF,and/or OPN. In some embodiments of systems for generating a report for apatient, systems for generating a report to identify a cancer treatmentfor a patient, or systems for generating a report for a patient toidentify the patient as a candidate for increased monitoring, furtherdiagnostic testing, or both, the sensitivity of the system inidentifying a subject as having cancer is improved as compared toconventional systems for generating a report for a patient. In someembodiments of systems for generating a report for a patient, systemsfor generating a report to identify a cancer treatment for a patient, orsystems for generating a report for a patient to identify the patient asa candidate for increased monitoring, further diagnostic testing, orboth, the specificity of the system in identifying a subject as havingcancer is improved as compared to conventional systems for generating areport for a patient. In some embodiments of systems for generating areport for a patient, systems for generating a report to identify acancer treatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both, the first and second referencedistribution of mutation allele frequency values are input into thesystem from a location that is remote from the at least one computerdatabase. In some embodiments of systems for generating a report for apatient, systems for generating a report to identify a cancer treatmentfor a patient, or systems for generating a report for a patient toidentify the patient as a candidate for increased monitoring, furtherdiagnostic testing, or both, first and second reference distribution ofmutation allele frequency values are input into the system over aninternet connection. In some embodiments of systems for generating areport for a patient, systems for generating a report to identify acancer treatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both, the report is in electronic orpaper format. In some embodiments of systems for generating a report fora patient, systems for generating a report to identify a cancertreatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both, the cancer is any of the variety ofcancer types described herein (see, e.g., section entitled “Cancers”).In some embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both that include at least one device configured to assayone or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, or16) of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR,BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNASand at least one a device (e.g., a device that includes a multipleximmunoassay system) configured to detect a level of one or more proteinbiomarkers in a biological sample, wherein the protein biomarkers areone or more (e.g., 1, 2, 3, 4, 5, 6, 7, or 8) of: CA19-9, CEA, HGF, OPN,CA125, prolactin, TIMP-1, and/or myeloperoxidase (MPO), the cancer isliver cancer, ovary cancer, esophageal cancer, stomach cancer,pancreatic cancer, colorectal cancer, lung cancer, breast cancer, orprostate cancer. In some embodiments of systems for generating a reportfor a patient, systems for generating a report to identify a cancertreatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both that include at least one deviceconfigured to assay one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 16, or 16) of the following genes: NRAS, CTNNB1, PIK3CA,FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53,PPP2R1A, and/or GNAS and at least one a device (e.g., a device thatincludes a multiplex immunoassay system) configured to detect a level ofone or more protein biomarkers in a biological sample, wherein theprotein biomarkers are one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,or 11) of: CA19-9, CEA, HGF, OPN, CA125, AFP, prolactin, TIMP-1,follistatin, G-CSF, and/or CA15-3, the cancer is liver cancer, ovarycancer, esophageal cancer, stomach cancer, pancreatic cancer, colorectalcancer, lung cancer, breast cancer, or prostate cancer. In someembodiments of systems for generating a report for a patient, systemsfor generating a report to identify a cancer treatment for a patient, orsystems for generating a report for a patient to identify the patient asa candidate for increased monitoring, further diagnostic testing, orboth that include at least one device configured to assay one or more(e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, or 16) of thefollowing genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A,PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS and at leastone a device (e.g., a device that includes a multiplex immunoassaysystem) configured to detect a level of one or more protein biomarkersin a biological sample, wherein the protein biomarkers are one or more(e.g., 1, 2, 3, 4, 5, 6, 7, 8, or 9) of: CA19-9, CEA, HGF, OPN, CA125,AFP, prolactin, TIMP-1, and/or CA15-3, the cancer is liver cancer, ovarycancer, esophageal cancer, stomach cancer, pancreatic cancer, colorectalcancer, lung cancer, breast cancer, or prostate cancer. In someembodiments of systems for generating a report for a patient, systemsfor generating a report to identify a cancer treatment for a patient, orsystems for generating a report for a patient to identify the patient asa candidate for increased monitoring, further diagnostic testing, orboth that include at least one device configured to assay one or more(e.g., 1, 2, 3, or 4) of the following genes: KRAS, TP53, CDKN2A, and/orSMAD4 and at least one a device (e.g., a device that includes amultiplex immunoassay system) configured to detect a level of one ormore protein biomarkers in a biological sample, wherein the proteinbiomarkers are one or more (e.g., 1, 2, 3, or 4) of: CA19-9, CEA, HGF,and/or OPN, the cancer is pancreatic cancer. In some embodiments ofsystems for generating a report for a patient or systems for generatinga report to identify a cancer treatment for a patient, the patient isidentified as a candidate for increased monitoring, further diagnostictesting, or both (e.g. any of the variety of increased monitoring orfurther diagnostic methods described herein).

In some embodiments, provided herein are methods of identifying asubject as having cancer that include: detecting the presence of one ormore genetic biomarkers in a first sample obtained from said subject;detecting the presence of aneuploidy in a second sample obtained fromsaid subject; and identifying the subject as having cancer when thepresence of one or more genetic biomarkers is detected in the firstsample, when the presence of aneuploidy is detected in the secondsample, or both. In some embodiments, provided herein are methods oftreating a subject having cancer that include: detecting the presence ofone or more genetic biomarkers in a first sample obtained from saidsubject; detecting the presence of aneuploidy in a second sampleobtained from said subject; and administering one or more therapeuticinterventions to said subject when the presence of one or more geneticbiomarkers is detected in the first sample, when the presence ofaneuploidy is detected in the second sample, or both.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the step of detecting the presence of one or more geneticbiomarkers comprises a method that includes a PCR-based multiplex assay,using a PCR-based singleplex assay, a digital PCR assay, a dropletdigital PCR (ddPCR) assay, a microarray assay, a next-generationsequencing assay, a Sanger sequencing assay, a quantitative PCR assay,or a ligation assay. In some embodiments, the step of detecting thepresence of one or more genetic biomarkers comprises a method thatincreases the sensitivity of massively parallel sequencing instrumentswith an error reduction technique comprising: a. assigning a uniqueidentifier (UID) to each of a plurality of template molecules present inthe sample; b. amplifying each uniquely tagged template molecule tocreate UID-families; and c. redundantly sequencing the amplificationproducts.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the step of detecting the presence of aneuploidy includes:amplifying long interspersed nucleotide elements (LINEs) across thegenome of the second sample, thereby obtaining a plurality of amplicons;sequencing the plurality of amplicons to obtain sequencing reads;placing the sequencing reads into pre-defined clusters of genomicintervals; and determining the presence of aneuploidy in the secondsample when the number of sequencing reads of a genomic region within apre-defined cluster is significantly different from the expected numberof sequencing reads of the genomic region within the pre-definedcluster. In some embodiments, the pre-defined clusters of genomicintervals are created by grouping genomic intervals based on read depthsof sequencing reads of two or more euploid samples. In some embodiments,determining that the number of sequencing reads of a genomic regionwithin a pre-defined cluster is significantly different from theexpected number of sequencing reads of the genomic region within thepre-defined cluster includes: calculating the distribution of sequencingreads of all genomic intervals in the pre-defined cluster, wherein thesequence reads of all genomic intervals in the pre-defined cluster areobtained by sequencing the amplicons derived from the second sample; anddetermining the number of sequencing reads of the genomic region isoutside a significance threshold of the distribution. In someembodiments, determining that the number of sequencing reads of agenomic region within a pre-defined cluster is significantly differentfrom the expected number of sequencing reads of the genomic regionwithin the pre-defined cluster includes: calculating sums ofdistributions of the sequencing reads in each genomic interval using theequation Σ₁ ^(I)R_(i)˜N(Σ₁ ^(I)μ_(i), Σ₁ ^(I)σ_(i) ²), wherein R_(i) isthe number of sequencing reads, I is the number of clusters on achromosome arm, N is a Gaussian distribution with parameters μ_(i) andσ_(i) ², where μ_(i) is the mean number of sequencing reads in eachgenomic interval, and where σ_(i) ² is the variance of sequencing readsin each genomic interval; calculating a Z-score of a chromosome armusing the quantile function 1-CDF(Σ₁ ^(I)μ_(i), Σ₁ ^(I)σ_(i) ²); andidentifying the presence of an aneuploidy in the tissue of the mammalwhen the Z-score is outside a significance threshold.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the step of detecting the presence of aneuploidy includes:sequencing a plurality of amplicons obtained from a second sample toobtain variant sequencing reads for a plurality of polymorphic sites;selecting a chromosome arm having the variant sequencing reads and thereference sequencing reads on both alleles that is greater than about 3;determining a variant-allele frequency (VAF) of each polymorphic site inthe selected chromosome arm, wherein said VAF is the number of variantsequencing reads/total number of sequencing reads; and identifying thepresence of aneuploidy on the selected chromosome arm if the VAF of oneor more polymorphic sites is outside a significance threshold of anormal distribution, wherein the expected VAF is 0.5. In someembodiments, the step of sequencing includes: a. assigning a uniqueidentifier (UID) to each of a plurality of amplicons, b. amplifying eachuniquely tagged amplicon to create UID-families, and c. redundantlysequencing the amplification products. In some embodiments, the step ofidentifying the presence of aneuploidy on the selected chromosome armincludes: calculating a Z-score for one or more polymorphic sites onsaid selected chromosome arm using the equation

${\sim \frac{\sum\limits_{i = 1}^{k}{w_{i}Z_{i}}}{\sqrt{\sum\limits_{i = 1}^{k}w_{i}^{2}}}},$

where w_(i) is UID depth at a variant i, Z_(i) is the Z-score of VAF forvariant i, and k is the number of variants observed on the chromosomearm.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the methods further include performing cytology on the firstsample, the second sample, or both, and identifying the subject ashaving cancer when the presence of one or more genetic biomarkers isdetected in the first sample, when the presence of aneuploidy isdetected in the second sample, a positive cytology indicates that thesubject has cancer, or combinations thereof.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the cancer is a bladder cancer or an upper tract urothelialcarcinoma; the one or more genetic biomarkers are one or more of: TP53,PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, or VHL; the methodfurther includes detecting the presence of at least one geneticbiomarker (e.g., a mutation) in a TERT promoter; and the presence of oneor more genetic biomarkers in one or more of TP53, PIK3CA, FGFR3, KRAS,ERBB2, CDKN2A, MLL, HRAS, MET, and VHL, the presence of the at least onegenetic biomarker (e.g., a mutation) in the TERT promoter, or thepresence of aneuploidy indicates that the subject has bladder cancer. Insome embodiments, the presence of one or more genetic biomarkers in oneor more of TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, andVHL, the presence of the at least one genetic biomarker (e.g., amutation) in the TERT promoter, and the presence of aneuploidy indicatesthat the subject has bladder cancer. In some embodiments, the one ormore genetic biomarkers are TP53, FGFR3, or both. In some embodiments,the step of detecting the presence of aneuploidy comprises detecting thepresence of aneuploidy on one or more of chromosome arms 5q, 8q, and 9p.In some embodiments, the one or more genetic biomarkers, the one or moregenetic biomarkers (e.g., a mutations) in the TERT promoter, or both,are present in 0.03% or fewer of the urinary cells in the sample. Insome embodiments, the step of detecting the presence of at least onegenetic biomarker (e.g., a mutation) in the TERT promoter is performedusing a PCR based multiplex assay, a Sanger Sequencing assay, or a nextgeneration sequencing assay. In some embodiments, the step of detectingthe presence of at least one genetic biomarker (e.g., a mutation) in theTERT promoter is performed by increasing the sensitivity of massivelyparallel sequencing instruments with an error reduction techniqueincluding: a. assigning a unique identifier (UID) to each of a pluralityof template molecules present in the sample; b. amplifying each uniquelytagged template molecule to create UID-families; and c. redundantlysequencing the amplification products.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the method includes detecting bladder cancer and furtherincludes administering transuretral resection of the bladder (TURB),intravesical BCG (Bacillus Calmette-Guerin), intravesical chemotherapy,adjuvant chemotherapy, neoadjuvant chemotherapy, cystectomy orcystoprostatectomy, radiation therapy, immunotherapy, immune checkpointinhibitors, or any combination thereof.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the method includes detecting an upper tract urothelialcarcinoma and further includes administering transurethral resection,intravesical BCG (Bacillus Calmette-Guerin), intravesical chemotherapy,adjuvant chemotherapy, neoadjuvant chemotherapy, ureterectomy ornephroureterectomy, radiation therapy, immunotherapy, immune checkpointinhibitors, or any combination thereof.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the cancer is an ovarian or endometrial cancer; the one ormore genetic biomarkers are one or more of NRAS, PTEN, FGFR2, KRAS,POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1,APC, EGFR, BRAF, or CDKN2A; and the presence of one or more mutations inone or more of NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, orCDKN2A, the presence of aneuploidy, or both indicates that the subjecthas ovarian or endometrial cancer. In some embodiments of identifying asubject as having cancer or treating a subject having cancer (e.g.,based on the presence of one or more genetic biomarkers in a firstsample obtained from said subject and/or the presence of aneuploidy in asecond sample obtained from the subject), the cancer is an endometrialcancer; the one or more genetic biomarkers are one or more of PTEN,TP53, PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC, FBXW7, RNF43, orPPP2R1A; and the presence of one or more mutations in one or more ofPTEN, TP53, PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC, FBXW7,RNF43, or PPP2R1A, the presence of aneuploidy, or both indicates thatthe subject has endometrial cancer. In some embodiments of identifying asubject as having cancer or treating a subject having cancer (e.g.,based on the presence of one or more genetic biomarkers in a firstsample obtained from said subject and/or the presence of aneuploidy in asecond sample obtained from the subject), the cancer is a high-gradeserous carcinoma; the one or more genetic biomarkers in TP53; and thepresence of one or more genetic biomarkers in TP53, the presence ofaneuploidy, or both indicates that the subject has a high-grade serouscarcinoma. In some embodiments, the step of detecting the presence ofaneuploidy includes detecting the presence of aneuploidy on one or moreof chromosome arms 4p, 7q, 8q, and 9q. In some embodiments, the firstsample, the second sample, or both are collected via intrauterinesampling. In some embodiments, the first sample, the second sample, orboth are collected with a Tao brush. In some embodiments, the methodsfurther include detecting in a circulating tumor DNA (ctDNA) sampleobtained from the subject the presence of at least one genetic biomarkerin one or more of the following genes: AKT1, APC, BRAF, CDKN2A, CTNNB1,EGFR, FBXW7, FGFR2, GNAS, HRAS, KRAS, NRAS, PIK3CA, PPP2R1A, PTEN, orTP53. In some embodiments, the methods further include administering tothe subject a therapy, wherein the therapy includes: surgery, adjuvantchemotherapy, neoadjuvant chemotherapy, radiation therapy,immunotherapy, targeted therapy, immune checkpoint inhibitors, orcombinations thereof.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the genetic biomarker is a mutation in a gene.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the first sample and the second sample are the same. In someembodiments of identifying a subject as having cancer or treating asubject having cancer (e.g., based on the presence of one or moregenetic biomarkers in a first sample obtained from said subject and/orthe presence of aneuploidy in a second sample obtained from thesubject), the first sample and the second sample are different. In someembodiments of identifying a subject as having cancer or treating asubject having cancer (e.g., based on the presence of one or moregenetic biomarkers in a first sample obtained from said subject and/orthe presence of aneuploidy in a second sample obtained from thesubject), the first sample is a blood sample.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the methods further include a) determining a mutation allelefrequency in the sample for two or more of the genetic biomarkers; b)obtaining a score that indicates the likelihood that the subject has acancer by comparing the mutation allele frequency of each mutation inthe selected genetic biomarkers against a reference distribution ofmutation allele frequency for each mutation in control samples; and c)identifying the subject as having a cancer when the score is higher thana reference value for the score. In some embodiments, the methodsfurther include selecting one genetic biomarker with the highest scorethat indicates the probability that the subject has a cancer from thetwo or more genetic biomarkers; and comparing the score for the selectedgenetic biomarker to the reference value. In some embodiments, the scoreis Stouffer's Z-score.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the methods further include a) determining a mutation allelefrequency in the sample for two or more of the genetic biomarkers; andb) comparing the mutation allele frequency of each mutation in theselected genetic biomarkers to the maximum mutation allele frequency ofeach mutation for each mutation in control samples.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the sensitivity of identifying a subject as having cancer isincreased as compared to: 1) the sensitivity obtained when only thepresence of one or more genetic biomarkers in the sample obtained fromthe subject is detected, or 2) the sensitivity obtained when thepresence of only aneuploidy in the sample obtained from the subject isdetected. In some embodiments of identifying a subject as having canceror treating a subject having cancer (e.g., based on the presence of oneor more genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the specificity of identifying a subject as having cancer isincreased as compared to: 1) the specificity obtained when only thepresence of one or more genetic biomarkers in the sample obtained fromthe subject is detected, or 2) the specificity obtained when thepresence of only aneuploidy in the sample obtained from the subject isdetected.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the methods further include detecting the presence of adetecting one or more protein biomarkers in a sample obtained from thesubject, wherein the sample is the first sample, the second sample, or athird sample. In some embodiments of identifying a subject as havingcancer or treating a subject having cancer (e.g., based on the presenceof one or more genetic biomarkers in a first sample obtained from saidsubject and/or the presence of aneuploidy in a second sample obtainedfrom the subject), subject is a human. In some embodiments ofidentifying a subject as having cancer or treating a subject havingcancer (e.g., based on the presence of one or more genetic biomarkers ina first sample obtained from said subject and/or the presence ofaneuploidy in a second sample obtained from the subject), the cancer isa Stage I cancer.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject), the one or more genetic biomarkers comprise one or moremodifications in one or more genes. In some embodiments, the one or moremodifications comprise inactivating modifications, and wherein said oneor more genes comprise tumor suppressor genes. In some embodiments, theone or more modifications include a modification independently selectedfrom single base substitutions, insertions, or deletions,translocations, fusions, breaks, duplications, or amplifications.

In some embodiments of identifying a subject as having cancer ortreating a subject having cancer (e.g., based on the presence of one ormore genetic biomarkers in a first sample obtained from said subjectand/or the presence of aneuploidy in a second sample obtained from thesubject),

In some embodiments, provided herein are methods of identifying apatient as having a cancer that include: a) determining a mutationallele frequency in a sample collected from the patient for eachmutation in one or more of the following genes: NRAS, PTEN, FGFR2, KRAS,POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1,APC, EGFR, BRAF, or CDKN2A; b) obtaining a score that indicates thelikelihood that the subject has a cancer by comparing the mutationallele frequency of each mutation in the selected genetic biomarkersagainst a reference distribution of mutation allele frequency for eachmutation in control samples; and c) identifying the subject as having acancer when the score is higher than a reference value for the score. Insome embodiments, provided herein are methods of identifying a patientas having a cancer that include: a) determining a mutation allelefrequency in a sample collected from the patient for each mutation inone or more of the following genes: PTEN, TP53, PIK3CA, PIK3R1, CTNNB1,KRAS, FGFR2, POLE, APC, FBXW7, RNF43, or PPP2R1A; b) obtaining a scorethat indicates the likelihood that the subject has a cancer by comparingthe mutation allele frequency of each mutation in the selected geneticbiomarkers against a reference distribution of mutation allele frequencyfor each mutation in control samples; and c) identifying the subject ashaving a cancer when the score is higher than a reference value for thescore. In some embodiments, the methods further include selecting onegenetic biomarker with the highest score that indicates the probabilitythat the subject has a cancer from the two or more genetic biomarkers;and comparing the score for the selected genetic biomarker to thereference value. In some embodiments, provided herein are methods ofidentifying a patient as having a cancer that include: a) determining amutation allele frequency in the sample for one or more of the geneticbiomarkers; b) obtaining a score that indicates the probability that thesubject does not have a cancer by comparing the mutation allelefrequency of each mutation in the selected genetic biomarkers against areference distribution of mutation allele frequency for each mutation incontrol samples; and c) identifying the subject as not having a cancerwhen the score is lower than a reference value for the score. In someembodiments, the methods further include selecting one genetic biomarkerwith the lowest score that indicates the probability that the subjectdoes not have a cancer; and comparing the score for the selected geneticbiomarker to the reference value. In some embodiments, the score isStouffer's Z-score. In some embodiments, provided herein are methods ofidentifying a patient as having a cancer that include: a) determining amutation allele frequency in the blood sample for two or more of thegenetic biomarkers; and b) comparing the mutation allele frequency ofeach mutation in the selected genetic biomarkers to the maximum mutationallele frequency of each mutation for each mutation in control samples.In some embodiments, provided herein are methods of identifying apatient as having a cancer described in the preceding paragraph, thesample is assayed in two or more test samples using a method comprising:a. assigning a unique identifier (UID) to each of a plurality oftemplate molecules present in the sample; b. amplifying each uniquelytagged template molecule to create UID-families; and c. redundantlysequencing the amplification products. In some embodiments ofidentifying a patient as having a cancer, the methods further includedetecting the presence of aneuploidy in the sample (e.g., the presenceof aneuploidy on one or more of chromosome arms 4p, 7q, 8q, and 9q). Insome embodiments of identifying a patient as having a cancer, themethods further include detecting in a circulating tumor DNA (ctDNA)sample obtained from the subject the presence of at least one geneticbiomarker in one or more of the following genes: AKT1, APC, BRAF,CDKN2A, CTNNB1, EGFR, FBXW7, FGFR2, GNAS, HRAS, KRAS, NRAS, PIK3CA,PPP2R1A, PTEN, or TP53. In some embodiments, the cancer is a Stage Icancer. In some embodiments, the cancer is cervical cancer, endometrialcancer, ovarian cancer, or fallopian tubal cancer. In some embodiments,the at least one mutation comprises an inactivating modification, andwherein the at least one mutation is in a tumor suppressor gene. In someembodiments, the modifications are independently selected from singlebase substitutions, insertions, deletions, translocations, fusions,breaks, duplications, or amplifications. In some embodiments, themethods further include administering to the subject one or moretherapeutic interventions (e.g., one or more of: surgery, adjuvantchemotherapy, neoadjuvant chemotherapy, radiation therapy,immunotherapy, targeted therapy, immune checkpoint inhibitors, orcombinations thereof).

In some embodiments, provided herein are methods of identifying apatient as having a cancer that include: a) determining a mutationallele frequency in a sample collected from the patient for eachmutation in one or more of the following genes: TP53, PIK3CA, FGFR3,KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, or VHL; b) obtaining a score thatindicates the likelihood that the subject has a cancer by comparing themutation allele frequency of each mutation in the selected geneticbiomarkers against a reference distribution of mutation allele frequencyfor each mutation in control samples; and c) identifying the subject ashaving a cancer when the score is higher than a reference value for thescore. In some embodiments, the methods further include selecting onegenetic biomarker with the highest score that indicates the probabilitythat the subject has a cancer from the two or more genetic biomarkers;and comparing the score for the selected genetic biomarker to thereference value. In some embodiments, provided herein are methods ofidentifying a patient as having a cancer that include: a) determining amutation allele frequency in the sample for one or more of the geneticbiomarkers; b) obtaining a score that indicates the probability that thesubject does not have a cancer by comparing the mutation allelefrequency of each mutation in the selected genetic biomarkers against areference distribution of mutation allele frequency for each mutation incontrol samples; and c) identifying the subject as not having a cancerwhen the score is lower than a reference value for the score. In someembodiments, the methods further include selecting one genetic biomarkerwith the lowest score that indicates the probability that the subjectdoes not have a cancer; and comparing the score for the selected geneticbiomarker to the reference value. In some embodiments, the score isStouffer's Z-score. In some embodiments, provided herein are methods ofidentifying a patient as having a cancer that include: a) determining amutation allele frequency in the blood sample for two or more of thegenetic biomarkers; and b) comparing the mutation allele frequency ofeach mutation in the selected genetic biomarkers to the maximum mutationallele frequency of each mutation for each mutation in control samples.In some embodiments, provided herein are methods of identifying apatient as having a cancer described in the preceding paragraph, thesample is assayed in two or more test samples using a method comprising:a. assigning a unique identifier (UID) to each of a plurality oftemplate molecules present in the sample; b. amplifying each uniquelytagged template molecule to create UID-families; and c. redundantlysequencing the amplification products. In some embodiments, the methodsfurther include detecting the presence of at least one genetic biomarker(e.g., a mutation) in a TERT promoter in the sample. In some embodimentsof identifying a patient as having a cancer, the methods further includedetecting the presence of aneuploidy in the sample (e.g., the presenceof aneuploidy on one or more of chromosome arms 5q, 8q, and 9p). In someembodiments of identifying a patient as having a cancer, the methodsfurther include detecting in a circulating tumor DNA (ctDNA) sampleobtained from the subject the presence of at least one genetic biomarkerin one or more of the following genes: AKT1, APC, BRAF, CDKN2A, CTNNB1,EGFR, FBXW7, FGFR2, GNAS, HRAS, KRAS, NRAS, PIK3CA, PPP2R1A, PTEN, orTP53. In some embodiments, the cancer is a Stage I cancer. In someembodiments, the cancer is bladder cancer or an upper-tract urothelialcancer (UTUC). In some embodiments, the at least one mutation comprisesan inactivating modification, and wherein the at least one mutation isin a tumor suppressor gene. In some embodiments, the modifications areindependently selected from single base substitutions, insertions,deletions, translocations, fusions, breaks, duplications, oramplifications. In some embodiments, the methods further includeadministering to the subject one or more therapeutic interventions(e.g., one or more of: surgery, adjuvant chemotherapy, neoadjuvantchemotherapy, radiation therapy, immunotherapy, targeted therapy, immunecheckpoint inhibitors, or combinations thereof).

In some embodiments, provided herein are systems for generating a reportfor a patient that include: at least one device configured to assay aset of genes in a biological sample to determine mutation allelefrequency of each mutation in the set of genes in a sample collectedfrom the patient, wherein the set of genes comprises one or more (e.g.,1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) of the following genes: TP53, PIK3CA,FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL; b) at least onecomputer database including: i) a first reference distribution ofmutation allele frequency in control samples for each mutation in theset of genes; and ii) a second reference distribution of mutation allelefrequency in samples collected from patients having a cancer for eachmutation in the set of genes; c) a computer-readable program codecomprising instructions to execute the following: i) inputting themutation allele frequency in the biological sample of each mutation inthe set of genes; ii) comparing the mutation allele frequency to thefirst reference distribution; iii) comparing the mutation allelefrequency to the second reference distribution; and iv) calculating ascore that indicates the likelihood that the patient has a cancer; d) acomputer-readable program code comprising instructions to generate areport that indicates the patient as having a cancer if the score ishigher than a reference value; or a report that indicates the patient asnot having a cancer if the score is not higher than the reference value.In some embodiments of systems for generating a report for a patient,the systems further include at least one device configured to detect thepresence of at least one genetic biomarker (e.g., at least one mutation)in a TERT promoter.

In some embodiments, provided herein are systems for generating a reportfor a patient that include: at least one device configured to assay aset of genes in a biological sample to determine mutation allelefrequency of each mutation in the set of genes in a sample collectedfrom the patient, wherein the set of genes comprises one or more (e.g.,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18) of thefollowing genes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43,PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/orCDKN2A; b) at least one computer database including: i) a firstreference distribution of mutation allele frequency in control samplesfor each mutation in the set of genes; and ii) a second referencedistribution of mutation allele frequency in samples collected frompatients having a cancer for each mutation in the set of genes; c) acomputer-readable program code comprising instructions to execute thefollowing: i) inputting the mutation allele frequency in the biologicalsample of each mutation in the set of genes; ii) comparing the mutationallele frequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value. In some embodiments, provided herein are systems forgenerating a report for a patient that include: at least one deviceconfigured to assay a set of genes in a biological sample to determinemutation allele frequency of each mutation in the set of genes in asample collected from the patient, wherein the set of genes comprisesone or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, or 12) of thefollowing genes: PTEN, TP53, PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE,APC, FBXW7, RNF43, and/or PPP2R1A; b) at least one computer databaseincluding: i) a first reference distribution of mutation allelefrequency in control samples for each mutation in the set of genes; andii) a second reference distribution of mutation allele frequency insamples collected from patients having a cancer for each mutation in theset of genes; c) a computer-readable program code comprisinginstructions to execute the following: i) inputting the mutation allelefrequency in the biological sample of each mutation in the set of genes;ii) comparing the mutation allele frequency to the first referencedistribution; iii) comparing the mutation allele frequency to the secondreference distribution; and iv) calculating a score that indicates thelikelihood that the patient has a cancer; d) a computer-readable programcode comprising instructions to generate a report that indicates thepatient as having a cancer if the score is higher than a referencevalue; or a report that indicates the patient as not having a cancer ifthe score is not higher than the reference value. In some embodiments,provided herein are systems for generating a report for a patient thatinclude: at least one device configured to assay a TP53 gene in abiological sample collected from the patient to determine mutationallele frequency of TP53; b) at least one computer database including:i) a first reference distribution of mutation allele frequency incontrol samples for TP53; and ii) a second reference distribution ofmutation allele frequency in samples collected from patients having acancer for TP53; c) a computer-readable program code comprisinginstructions to execute the following: i) inputting the mutation allelefrequency in the biological sample of TP53; ii) comparing the mutationallele frequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value.

In some embodiments, provided herein are systems for generating a reportto identify a cancer treatment for a patient that include: a) at leastone device configured to assay a set of genes in a biological sample todetermine mutation allele frequency of each mutation in the set of genesin a sample collected from the patient, wherein the set of genescomprises one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) of thefollowing genes: TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS,MET, and/or VHL; b) at least one computer database comprising: i) afirst reference distribution of mutation allele frequency in controlsamples for each mutation in the set of genes; ii) a second referencedistribution of mutation allele frequency in samples collected frompatients having a cancer for each mutation in the set of genes; and iii)a listing of cancer treatment with efficacy linked to a biological stateof at least one member of the set of genes; c) a computer-readableprogram code comprising instructions to execute the following: i)inputting the mutation allele frequency in the biological sample of eachmutation in the set of genes; ii) comparing the mutation allelefrequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value; and e) a computer-readable program code comprisinginstructions to identify for the patient at least one cancer treatmentfrom the listing of cancer treatments in (b)(iii) when the patient isidentified as having a cancer in step (d), wherein the calculated scoreprovides an indication that the identified cancer treatment will beeffective in the patient. In some embodiments of systems for generatinga report to identify a cancer treatment for a patient, the systemsfurther include at least one device configured to detect the presence ofat least one genetic biomarker (e.g., at least one mutation) in a TERTpromoter.

In some embodiments, provided herein are systems for generating a reportto identify a cancer treatment for a patient that include: a) at leastone device configured to assay a set of genes in a biological sample todetermine mutation allele frequency of each mutation in the set of genesin a sample collected from the patient, wherein the set of genescomprises one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, or 18) of the following genes: NRAS, PTEN, FGFR2, KRAS,POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1,APC, EGFR, BRAF, and/or CDKN2A; b) at least one computer databasecomprising: i) a first reference distribution of mutation allelefrequency in control samples for each mutation in the set of genes; ii)a second reference distribution of mutation allele frequency in samplescollected from patients having a cancer for each mutation in the set ofgenes; and iii) a listing of cancer treatment with efficacy linked to abiological state of at least one member of the set of genes; c) acomputer-readable program code comprising instructions to execute thefollowing: i) inputting the mutation allele frequency in the biologicalsample of each mutation in the set of genes; ii) comparing the mutationallele frequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value; and e) a computer-readable program code comprisinginstructions to identify for the patient at least one cancer treatmentfrom the listing of cancer treatments in (b)(iii) when the patient isidentified as having a cancer in step (d), wherein the calculated scoreprovides an indication that the identified cancer treatment will beeffective in the patient. In some embodiments, provided herein aresystems for generating a report to identify a cancer treatment for apatient that include: a) at least one device configured to assay a setof genes in a biological sample to determine mutation allele frequencyof each mutation in the set of genes comprises one or more (e.g., 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, or 12) of the following genes: PTEN, TP53,PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC, FBXW7, RNF43, and/orPPP2R1A; b) at least one computer database comprising: i) a firstreference distribution of mutation allele frequency in control samplesfor each mutation in the set of genes; ii) a second referencedistribution of mutation allele frequency in samples collected frompatients having a cancer for each mutation in the set of genes; and iii)a listing of cancer treatment with efficacy linked to a biological stateof at least one member of the set of genes; c) a computer-readableprogram code comprising instructions to execute the following: i)inputting the mutation allele frequency in the biological sample of eachmutation in the set of genes; ii) comparing the mutation allelefrequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value; and e) a computer-readable program code comprisinginstructions to identify for the patient at least one cancer treatmentfrom the listing of cancer treatments in (b)(iii) when the patient isidentified as having a cancer in step (d), wherein the calculated scoreprovides an indication that the identified cancer treatment will beeffective in the patient. In some embodiments, provided herein aresystems for generating a report to identify a cancer treatment for apatient that include: a) at least one device configured to assay a TP53gene in a biological sample collected from the patient to determinemutation allele frequency of TP53; b) at least one computer databasecomprising: i) a first reference distribution of mutation allelefrequency in control samples for TP53; ii) a second referencedistribution of mutation allele frequency in samples collected frompatients having a cancer for TP53; and iii) a listing of cancertreatment with efficacy linked to a biological state of TP53; c) acomputer-readable program code comprising instructions to execute thefollowing: i) inputting the mutation allele frequency in the biologicalsample of TP53; ii) comparing the mutation allele frequency to the firstreference distribution; iii) comparing the mutation allele frequency tothe second reference distribution; and iv) calculating a score thatindicates the likelihood that the patient has a cancer; d) acomputer-readable program code comprising instructions to generate areport that indicates the patient as having a cancer if the score ishigher than a reference value; or a report that indicates the patient asnot having a cancer if the score is not higher than the reference value;and e) a computer-readable program code comprising instructions toidentify for the patient at least one cancer treatment from the listingof cancer treatments in (b)(iii) when the patient is identified ashaving a cancer in step (d), wherein the calculated score provides anindication that the identified cancer treatment will be effective in thepatient.

In some embodiments, provided herein are systems for generating a reportfor a patient to identify the patient as a candidate for increasedmonitoring, further diagnostic testing, or both that include: a) atleast one device configured to assay a set of genes in a biologicalsample to determine mutation allele frequency of each mutation in theset of genes in a sample collected from the patient, wherein the set ofgenes in a sample collected from the patient, wherein the set of genescomprises one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) of thefollowing genes: TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS,MET, and/or VHL; b) at least one computer database comprising: i) afirst reference distribution of mutation allele frequency in controlsamples for each mutation in the set of genes; ii) a second referencedistribution of mutation allele frequency in samples collected frompatients having a cancer for each mutation in the set of genes; and iii)a listing of cancer treatment with efficacy linked to a biological stateof at least one member of the set of genes; c) a computer-readableprogram code comprising instructions to execute the following: i)inputting the mutation allele frequency in the biological sample of eachmutation in the set of genes; ii) comparing the mutation allelefrequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value; and e) a computer-readable program code comprisinginstructions to identify for the patient at least one cancer treatmentfrom the listing of cancer treatments in (b)(iii) when the patient isidentified as having a cancer in step (d), wherein the calculated scoreprovides an indication that the identified cancer treatment will beeffective in the patient. In some embodiments of systems for generatinga report for a patient to identify the patient as a candidate forincreased monitoring, further diagnostic testing, or both, the systemsfurther include at least one device configured to detect the presence ofat least one genetic biomarker (e.g., at least one mutation) in a TERTpromoter.

In some embodiments, provided herein are systems for generating a reportfor a patient to identify the patient as a candidate for increasedmonitoring, further diagnostic testing, or both that include: a) atleast one device configured to assay a set of genes in a biologicalsample to determine mutation allele frequency of each mutation in theset of genes in a sample collected from the patient, wherein the set ofgenes in a sample collected from the patient, wherein the set of genescomprises one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, or 18) of the following genes: NRAS, PTEN, FGFR2, KRAS,POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1,APC, EGFR, BRAF, and/or CDKN2A; b) at least one computer databasecomprising: i) a first reference distribution of mutation allelefrequency in control samples for each mutation in the set of genes; ii)a second reference distribution of mutation allele frequency in samplescollected from patients having a cancer for each mutation in the set ofgenes; and iii) a listing of cancer treatment with efficacy linked to abiological state of at least one member of the set of genes; c) acomputer-readable program code comprising instructions to execute thefollowing: i) inputting the mutation allele frequency in the biologicalsample of each mutation in the set of genes; ii) comparing the mutationallele frequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value; and e) a computer-readable program code comprisinginstructions to identify for the patient at least one cancer treatmentfrom the listing of cancer treatments in (b)(iii) when the patient isidentified as having a cancer in step (d), wherein the calculated scoreprovides an indication that the identified cancer treatment will beeffective in the patient. In some embodiments, provided herein aresystems for generating a report for a patient to identify the patient asa candidate for increased monitoring, further diagnostic testing, orboth that include: a) at least one device configured to assay a set ofgenes in a biological sample to determine mutation allele frequency ofeach mutation in the set of genes comprises one or more (e.g., 1, 2, 3,4, 5, 6, 7, 8, 9, 10, 11, or 12) of the following genes: PTEN, TP53,PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC, FBXW7, RNF43, and/orPPP2R1A; b) at least one computer database comprising: i) a firstreference distribution of mutation allele frequency in control samplesfor each mutation in the set of genes; ii) a second referencedistribution of mutation allele frequency in samples collected frompatients having a cancer for each mutation in the set of genes; and iii)a listing of cancer treatment with efficacy linked to a biological stateof at least one member of the set of genes; c) a computer-readableprogram code comprising instructions to execute the following: i)inputting the mutation allele frequency in the biological sample of eachmutation in the set of genes; ii) comparing the mutation allelefrequency to the first reference distribution; iii) comparing themutation allele frequency to the second reference distribution; and iv)calculating a score that indicates the likelihood that the patient has acancer; d) a computer-readable program code comprising instructions togenerate a report that indicates the patient as having a cancer if thescore is higher than a reference value; or a report that indicates thepatient as not having a cancer if the score is not higher than thereference value; and e) a computer-readable program code comprisinginstructions to identify for the patient at least one cancer treatmentfrom the listing of cancer treatments in (b)(iii) when the patient isidentified as having a cancer in step (d), wherein the calculated scoreprovides an indication that the identified cancer treatment will beeffective in the patient. In some embodiments, provided herein aresystems for generating a report for a patient to identify the patient asa candidate for increased monitoring, further diagnostic testing, orboth that include: a) at least one device configured to assay a TP53gene in a biological sample collected from the patient to determinemutation allele frequency of TP53; b) at least one computer databasecomprising: i) a first reference distribution of mutation allelefrequency in control samples for TP53; ii) a second referencedistribution of mutation allele frequency in samples collected frompatients having a cancer for TP53; and iii) a listing of cancertreatment with efficacy linked to a biological state of TP53; c) acomputer-readable program code comprising instructions to execute thefollowing: i) inputting the mutation allele frequency in the biologicalsample of TP53; ii) comparing the mutation allele frequency to the firstreference distribution; iii) comparing the mutation allele frequency tothe second reference distribution; and iv) calculating a score thatindicates the likelihood that the patient has a cancer; d) acomputer-readable program code comprising instructions to generate areport that indicates the patient as having a cancer if the score ishigher than a reference value; or a report that indicates the patient asnot having a cancer if the score is not higher than the reference value;and e) a computer-readable program code comprising instructions toidentify for the patient at least one cancer treatment from the listingof cancer treatments in (b)(iii) when the patient is identified ashaving a cancer in step (d), wherein the calculated score provides anindication that the identified cancer treatment will be effective in thepatient.

In some embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both, the systems further include a computer-readableprogram code comprising instructions to execute the following: i)determining a mutation allele frequency in the sample for two or more ofthe genetic biomarkers; ii) obtaining a score that indicates thelikelihood that the subject has a cancer by comparing the mutationallele frequency of each mutation in the selected genetic biomarkersagainst a reference distribution of mutation allele frequency for eachmutation in control samples; and ii) identifying the subject as having acancer when the score is higher than a reference value for the score. Insome embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both, the systems further include a computer-readableprogram code comprising instructions to execute the followinginstructions: i) selecting one genetic biomarker with the highest scorethat indicates the probability that the subject has a cancer from thetwo or more genetic biomarkers; and ii) comparing the score for theselected genetic biomarker to the reference value. In some embodimentsof systems for generating a report for a patient, systems for generatinga report to identify a cancer treatment for a patient, or systems forgenerating a report for a patient to identify the patient as a candidatefor increased monitoring, further diagnostic testing, or both, the scoreis Stouffer's Z-score. In some embodiments of systems for generating areport for a patient, systems for generating a report to identify acancer treatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both, the systems further include acomputer-readable program code comprising instructions to execute thefollowing instructions: i) determining a mutation allele frequency inthe sample for two or more of the genetic biomarkers; and ii) comparingthe mutation allele frequency of each mutation in the selected geneticbiomarkers to the maximum mutation allele frequency of each mutation foreach mutation in control samples.

In some embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both, the device configured to assay a set of genescomprises a device that employs a includes a PCR-based multiplex assay,using a PCR-based singleplex assay, a digital PCR assay, a dropletdigital PCR (ddPCR) assay, a microarray assay, a next-generationsequencing assay, a Sanger sequencing assay, a quantitative PCR assay,or a ligation assay. In some embodiments of systems for generating areport for a patient, systems for generating a report to identify acancer treatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both, the multiplex PCR-based sequencingassay comprises: a. assigning a unique identifier (UID) to each of aplurality of template molecules present in the sample; b. amplifyingeach uniquely tagged template molecule to create UID-families; and c.redundantly sequencing the amplification products. In some embodimentsof systems for generating a report for a patient, systems for generatinga report to identify a cancer treatment for a patient, or systems forgenerating a report for a patient to identify the patient as a candidatefor increased monitoring, further diagnostic testing, or both, thesystem further includes a device configured to detect the presence ofaneuploidy in a biological sample. In some embodiments of systems forgenerating a report for a patient, systems for generating a report toidentify a cancer treatment for a patient, or systems for generating areport for a patient to identify the patient as a candidate forincreased monitoring, further diagnostic testing, or both that includeat least one device configured to assay one or more (e.g., 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, or 16) of the following genes:NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS,KRAS, AKT1, TP53, PPP2R1A, and/or GNAS, the system further includes adevice (e.g., a device that includes a multiplex immunoassay system)configured to detect the presence of aneuploidy in a biological sample.In some embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both that include: 1) at least one device configured toassay one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,16, or 16) of the following genes: NRAS, CTNNB1, PIK3CA, FBXW7, APC,EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/orGNAS, and 2) at least one device configured to detect the presence of agenetic biomarker (e.g., at least one mutation) in a TERT promoter, thesystem further includes a device (e.g., a device that includes amultiplex immunoassay system) configured to detect the presence ofaneuploidy in a biological sample. In some embodiments of systems forgenerating a report for a patient, systems for generating a report toidentify a cancer treatment for a patient, or systems for generating areport for a patient to identify the patient as a candidate forincreased monitoring, further diagnostic testing, or both that includeat least one device configured to assay one or more (e.g., 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18) of the followinggenes: NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1,CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, and/or CDKN2A, thesystem further includes a device (e.g., a device that includes amultiplex immunoassay system) configured to detect the presence ofaneuploidy in a biological sample. In some embodiments of systems forgenerating a report for a patient, systems for generating a report toidentify a cancer treatment for a patient, or systems for generating areport for a patient to identify the patient as a candidate forincreased monitoring, further diagnostic testing, or both that includeat least one device configured to assay one or more (e.g., 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, or 12) of the following genes: PTEN, TP53,PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC, FBXW7, RNF43, and/orPPP2R1A, the system further includes a device (e.g., a device thatincludes a multiplex immunoassay system) configured to detect thepresence of aneuploidy in a biological sample. In some embodiments ofsystems for generating a report for a patient, systems for generating areport to identify a cancer treatment for a patient, or systems forgenerating a report for a patient to identify the patient as a candidatefor increased monitoring, further diagnostic testing, or both thatinclude at least one device configured to assay TP53, the system furtherincludes a device (e.g., a device that includes a multiplex immunoassaysystem) configured to detect the presence of aneuploidy in a biologicalsample. The presence of aneuploidy can be detected on one or morechromosomes or chromosomal arms that are associated with cancer. In someembodiments, the presence of aneuploidy is detected on one or more ofchromosomal arms 5q, 8q, and/or 9p. In some embodiments, the presence ofaneuploidy is detected on one or more of chromosomal arms 4p, 7q, 8q,and/or 9q

In some embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both, the sensitivity of the system in identifying a subjectas having cancer is improved as compared to conventional systems forgenerating a report for a patient. In some embodiments of systems forgenerating a report for a patient, systems for generating a report toidentify a cancer treatment for a patient, or systems for generating areport for a patient to identify the patient as a candidate forincreased monitoring, further diagnostic testing, or both, thespecificity of the system in identifying a subject as having cancer isimproved as compared to conventional systems for generating a report fora patient. In some embodiments of systems for generating a report for apatient, systems for generating a report to identify a cancer treatmentfor a patient, or systems for generating a report for a patient toidentify the patient as a candidate for increased monitoring, furtherdiagnostic testing, or both, the first and second reference distributionof mutation allele frequency values are input into the system from alocation that is remote from the at least one computer database. In someembodiments of systems for generating a report for a patient, systemsfor generating a report to identify a cancer treatment for a patient, orsystems for generating a report for a patient to identify the patient asa candidate for increased monitoring, further diagnostic testing, orboth, first and second reference distribution of mutation allelefrequency values are input into the system over an internet connection.In some embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both, the report is in electronic or paper format.

In some embodiments of systems for generating a report for a patient,systems for generating a report to identify a cancer treatment for apatient, or systems for generating a report for a patient to identifythe patient as a candidate for increased monitoring, further diagnostictesting, or both, the cancer is any of the variety of cancer typesdescribed herein (see, e.g., section entitled “Cancers”). In someembodiments of systems for generating a report for a patient, systemsfor generating a report to identify a cancer treatment for a patient, orsystems for generating a report for a patient to identify the patient asa candidate for increased monitoring, further diagnostic testing, orboth that include at least one device configured to assay comprises oneor more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10) of the following genes:TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and/or VHL andat least one a device configured to detect the presence of aneuploidy ina biological sample, the cancer is bladder cancer or an upper-tracturothelial carcinoma. In some embodiments of systems for generating areport for a patient, systems for generating a report to identify acancer treatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both that include at least one deviceconfigured to assay one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, or 18) of the following genes: NRAS, PTEN,FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA,FBXW7, PIK3R1, APC, EGFR, BRAF, and/or CDKN2A and at least one a deviceconfigured to detect the presence of aneuploidy in a biological sample,the cancer is cervical cancer, endometrial cancer, ovarian cancer, orfallopian tubal cancer. In some embodiments of systems for generating areport for a patient, systems for generating a report to identify acancer treatment for a patient, or systems for generating a report for apatient to identify the patient as a candidate for increased monitoring,further diagnostic testing, or both that include at least one deviceconfigured to assay one or more (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, or 12) of the following genes: PTEN, TP53, PIK3CA, PIK3R1, CTNNB1,KRAS, FGFR2, POLE, APC, FBXW7, RNF43, and/or PPP2R1A and at least one adevice configured to detect the presence of aneuploidy in a biologicalsample, the cancer is endometrial cancer. In some embodiments of systemsfor generating a report for a patient, systems for generating a reportto identify a cancer treatment for a patient, or systems for generatinga report for a patient to identify the patient as a candidate forincreased monitoring, further diagnostic testing, or both that includeat least one device configured to assay TP53 and at least one a deviceconfigured to detect the presence of aneuploidy in a biological sample,the cancer is a high-grade serous carcinoma. In some embodiments ofsystems for generating a report for a patient or systems for generatinga report to identify a cancer treatment for a patient, the patient isidentified as a candidate for increased monitoring, further diagnostictesting, or both (e.g. any of the variety of increased monitoring orfurther diagnostic methods described herein).

Many of the currently approved tests for earlier cancer detection areprocedural in nature, and include colonoscopy, mammography, and cervicalcytology analysis. To date, the vast majority of cancer patientsevaluated with mutation-based liquid biopsies have advanced stagedisease. Yet another issue with liquid biopsies is the identification ofthe underlying organ of origin. Because the same gene mutations drivemultiple tumor types, liquid biopsies based on such alterations cannotgenerally identify the location of the primary tumor giving rise to apositive blood test. Described herein is a non-invasive combinatorialblood test (e.g., a test that combines DNA markers and protein markers)for the early detection and localization of many common cancers.

This document provides methods and materials for assessing and/ortreating mammals (e.g., humans) having, or suspected of having, cancer.In some embodiments, this document provides methods and materials foridentifying a mammal as having cancer. For example, a sample (e.g., ablood sample) obtained from a mammal can be assessed to determine if themammal has cancer based, at least in part, on the presence or absence ofone or more biomarkers (e.g., genetic biomarkers) and/or an elevatedlevel of one or more biomarkers (e.g., peptide biomarkers). In someembodiments, this document provides methods and materials foridentifying the location (e.g., the anatomic site) of a cancer in amammal. For example, a sample (e.g., a blood sample) obtained from amammal can be assessed to determine the location of the cancer in themammal based, at least in part, on the presence or absence of one ormore biomarkers (e.g., genetic biomarkers) and/or an elevated level ofone or more biomarkers (e.g., peptide biomarkers). In some embodiments,this document provides methods and materials for identifying a mammal ashaving cancer, and administering one or more pharmacologicalinterventions to treat the mammal. For example, a sample (e.g., a bloodsample) obtained from a mammal can be assessed to determine if themammal has cancer based, at least in part, on the presence or absence ofone or more biomarkers (e.g., genetic biomarkers) and/or an elevatedlevel of one or more biomarkers (e.g., peptide biomarkers), andadministering one or more cancer treatments to the mammal.

As demonstrated herein, an analysis of only 2,001 bp of genomic DNAcould detect at least one mutation in 82% of eight common cancer types.A test (CancerSEEK) was designed which assessed the levels of 10circulating proteins as well as mutations of these 2,001 bp incirculating cell-free DNA. This test was applied to 1,005 patients withcancers of the liver, ovary, esophagus, stomach, pancreas, colorectum,lung, or breast. CancerSEEK tests were positive in a median of 70% ofthe eight cancer types, while fewer than 1% of 812 normal individualsscored positively. The sensitivities ranged from 69% to 98% for thedetection of five cancer types (liver, ovary, esophagus, stomach, andpancreas) for which there are no screening tests available foraverage-risk individuals. Moreover, the source of the cancer could belocalized to a small number of anatomic sites in a median of 84% of thepatients scoring positive in the CancerSEEK assay.

Having the ability to use a blood test having very high specificity(e.g., by combining DNA markers and protein markers) can allowclinicians to detect cancers at earlier stages resulting in earliertreatment with fewer unnecessary follow-up procedures, less anxiety,and/or reduced cancer deaths.

In general, one aspect of this document features a method foridentifying a mammal as having cancer. The method can include, orconsist essentially of, detecting one or more genetic biomarkers incirculating DNA in a blood sample obtained from a mammal; detecting anelevated level of one or more peptide biomarkers in the blood sampleobtained from a mammal; and identifying the mammal as having cancer whenthe presence of one or more genetic biomarkers is detected incirculating DNA in said blood sample, when an elevated level of one ormore peptide biomarkers is detected in said blood sample, or both. Themammal can be a human. The blood sample can be a plasma sample. Thecancer can be a Stage I cancer. The cancer can be a liver cancer, anovary cancer, an esophageal cancer, a stomach cancer, a pancreaticcancer, a colorectal cancer, a lung cancer, a breast cancer, or aprostate cancer. The one or more genetic biomarkers can include one ormore modifications in one or more genes. The one or more modificationscan include inactivating modifications, and the one or more genes caninclude tumor suppressor genes. The one or more modifications canindependently be selected from single base substitutions, insertions,and deletions. The one or more genes are can include NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, and/or GNAS. The one or more modifications canindependently be selected from the modifications set forth in Table 5.The step of detecting the presence of one or more genetic biomarkers canbe performed using a multiplex PCR-based sequencing assay. The multiplexPCR-based sequencing assay can include assigning a unique identifier(UID) to each template molecule; amplifying each uniquely taggedtemplate molecule to create UID-families; and redundantly sequencing theamplification products. The one or more peptide biomarkers can includeprolactin, OPN, IL-6, CEA, CA125, HGF, Myeloperoxidase, CA19-9, Midkine,and/or TIMP-1. The step of detecting the level of one or more peptidebiomarkers can be performed using a multiplex immunoassay system. Themethod also can include identifying a location of said cancer. Themethod also can include administering to the mammal one or more cancertreatments. The one or more cancer treatments can include surgery,chemotherapy, hormone therapy, targeted therapy, radiation therapy, andcombinations thereof.

In another aspect, this document features a method for treating a mammalhaving cancer. The method can include, or consist essentially of,detecting one or more genetic biomarkers in circulating DNA in a bloodsample obtained from a mammal; detecting an elevated level of one ormore peptide biomarkers in the blood sample obtained from a mammal; andadministering one or more cancer treatments to said mammal when thepresence of one or more genetic biomarkers is detected in circulatingDNA in said blood sample, when an elevated level of one or more peptidebiomarkers is detected in said blood sample, or both. The one or morecancer treatments can include surgery, chemotherapy, hormone therapy,targeted therapy, radiation therapy, and combinations thereof. Themammal can be a human. The blood sample can be a plasma sample. Thecancer can be a Stage I cancer. The cancer can be a liver cancer, anovary cancer, an esophageal cancer, a stomach cancer, a pancreaticcancer, a colorectal cancer, a lung cancer, a breast cancer, or aprostate cancer. The one or more genetic biomarkers can include one ormore modifications in one or more genes. The one or more modificationscan include inactivating modifications, and the one or more genes caninclude tumor suppressor genes. The one or more modifications canindependently be selected from single base substitutions, insertions,and deletions. The one or more genes are can include NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, and/or GNAS. The one or more modifications canindependently be selected from the modifications set forth in Table 5.The step of detecting the presence of one or more genetic biomarkers canbe performed using a multiplex PCR-based sequencing assay. The multiplexPCR-based sequencing assay can include assigning a unique identifier(UID) to each template molecule; amplifying each uniquely taggedtemplate molecule to create UID-families; and redundantly sequencing theamplification products. The one or more peptide biomarkers can includeprolactin, OPN, IL-6, CEA, CA125, HGF, Myeloperoxidase, CA19-9, Midkine,and/or TIMP-1. The step of detecting the level of one or more peptidebiomarkers can be performed using a multiplex immunoassay system.

In another aspect, this document features a method for identifying thelocation of a cancer in a mammal. The method can include, or consistessentially of, detecting one or more genetic biomarkers in circulatingDNA in a blood sample obtained from a mammal; detecting an elevatedlevel of one or more peptide biomarkers in the blood sample obtainedfrom a mammal; and identifying the location of the cancer in the mammalwhen the presence of one or more genetic biomarkers is detected incirculating DNA in said blood sample, when an elevated level of one ormore peptide biomarkers is detected in said blood sample, or both. Themammal can be a human. The blood sample can be a plasma sample. Thecancer can be a Stage I cancer. The cancer can be a liver cancer, anovary cancer, an esophageal cancer, a stomach cancer, a pancreaticcancer, a colorectal cancer, a lung cancer, a breast cancer, or aprostate cancer. The one or more genetic biomarkers can include one ormore modifications in one or more genes. The one or more modificationscan include inactivating modifications, and the one or more genes caninclude tumor suppressor genes. The one or more modifications canindependently be selected from single base substitutions, insertions,and deletions. The one or more genes are can include NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, and/or GNAS. The one or more modifications canindependently be selected from the modifications set forth in Table 5.The step of detecting the presence of one or more genetic biomarkers canbe performed using a multiplex PCR-based sequencing assay. The multiplexPCR-based sequencing assay can include assigning a unique identifier(UID) to each template molecule; amplifying each uniquely taggedtemplate molecule to create UID-families; and redundantly sequencing theamplification products. The one or more peptide biomarkers can includeprolactin, OPN, IL-6, CEA, CA125, HGF, Myeloperoxidase, CA19-9, Midkine,and/or TIMP-1. The step of detecting the level of one or more peptidebiomarkers can be performed using a multiplex immunoassay system.

Provided herein are methods for identifying the presence of a cancer ina human subject comprising: detecting in a first biological sampleisolated from the human subject the presence of one or more geneticalterations in cell-free DNA derived from a gene selected from the groupconsisting of: AKT1, APC, BRAF, CDKN2, CTNNB1, FBXW7, FGFR2, GNAS, HRAS,KRAS, PPP2R1A, TP53, PTEN, PIK3CA, EGFR and NRAS, and combinationsthereof; detecting a level of one or more protein biomarkers in a secondbiological sample isolated from the human subject, wherein the proteinbiomarker is selected from the group consisting of: carbohydrate antigen19-9 (CA19-9), carcinoembryonic antigen (CEA), hepatocyte growth factor(HGF), osteopontin (OPN), CA125, AFP, prolactin, TIMP-1, follistatin,G-CSF, and CA15-3, and combinations thereof; comparing the detectedlevels of the one or more protein biomarkers to one or more referencelevels of the protein biomarkers; and identifying the presence of thecancer in the human subject when the presence of one or more geneticalterations in the cell-free DNA is detected, the detected levels of theone or more protein biomarkers are higher than the reference levels ofthe one or more protein biomarkers, or both. In some embodiments, thefirst biological sample comprises blood, plasma, urine, cerebrospinalfluid, saliva, sputum, broncho-alveolar lavage, bile, lymphatic fluid,cyst fluid, stool, ascites, and combinations thereof. In someembodiments, the first biological sample, the second biological sample,or both comprises plasma. In some embodiments, the first and secondbiological samples are the same.

In some embodiments of identifying the presence of a cancer in a humansubject, the step of detecting the presence of one or more geneticalterations in cell-free DNA in the first biological sample comprisesamplifying an amplicon comprising codons and their surrounding splicesites, wherein the codons are selected from the group consisting of:codons 16-18 of AKT1; codons 1304-1311 or 1450-1459 of APC; codons591-602 of BRAF; codons 51-58 or 76-88 of CDKN2A; codons 31-39 or 38-47of CTNNB1; codons 856-868 of EGFR; codons 361-371, 464-473, 473-483, or498-507 of FBXW7; codons 250-256 of FGFR2; codons 199-208 of GNAS;codons 7-19 of HRAS; codons 7-14, 57-65, or 143-148 of KRAS; codons 3-15or 54-63 of NRAS; codons 80-90, 343-348, 541-551, or 1038-1050 ofPIK3CA; codons 175-187 of PPP2R1A; codons 90-98, 125-132, 133-146,145-154 of PTEN; codons 10-22, 25-32, 33-40, 40-52, 52-64, 82-94,97-110, 112-125, 123-125, 126-132, 132-142, 150-163, 167-177, 175-186,187-195, 195-206, 207-219, 219-224, 226-237, 232-245, 248-261, 261-268,272-283, 279-290, 298-307, 307-314, 323-331, 333-344, 344-355, 367-375,or 374-386 of TP53, and combinations thereof. In some embodiments, thestep of detecting the presence of one or more genetic alterations incell-free DNA in the first biological sample comprises sequencing generegions comprising codons and their surrounding splice sites, whereinthe codons are selected from the group consisting of: codons 16-18 ofAKT1; codons 1304-1311 or 1450-1459 of APC; codons 591-602 of BRAF;codons 51-58 or 76-88 of CDKN2A; codons 31-39 or 38-47 of CTNNB1; codons856-868 of EGFR; codons 361-371, 464-473, 473-483, or 498-507 of FBXW7;codons 250-256 of FGFR2; codons 199-208 of GNAS; codons 7-19 of HRAS;codons 7-14, 57-65, or 143-148 of KRAS; codons 3-15 or 54-63 of NRAS;codons 80-90, 343-348, 541-551, or 1038-1050 of PIK3CA; codons 175-187of PPP2R1A; codons 90-98, 125-132, 133-146, 145-154 of PTEN; codons10-22, 25-32, 33-40, 40-52, 52-64, 82-94, 97-110, 112-125, 123-125,126-132, 132-142, 150-163, 167-177, 175-186, 187-195, 195-206, 207-219,219-224, 226-237, 232-245, 248-261, 261-268, 272-283, 279-290, 298-307,307-314, 323-331, 333-344, 344-355, 367-375, or 374-386 of TP53, andcombinations thereof. In some embodiments, the step of detecting thepresence of one or more genetic alterations in cell-free DNA in thefirst biological sample comprises sequencing gene regions comprisingcodons and their surrounding splice from each of: codons 16-18 of AKT1;codons 1304-1311 and 1450-1459 of APC; codons 591-602 of BRAF; codons51-58 and 76-88 of CDKN2A; codons 31-39 and 38-47 of CTNNB1; codons856-868 of EGFR; codons 361-371, 464-473, 473-483, and 498-507 of FBXW7;codons 250-256 of FGFR2; codons 199-208 of GNAS; codons 7-19 of HRAS;codons 7-14, 57-65, and 143-148 of KRAS; codons 3-15 and 54-63 of NRAS;codons 80-90, 343-348, 541-551, and 1038-1050 of PIK3CA; codons 175-187of PPP2R1A; codons 90-98, 125-132, 133-146, 145-154 of PTEN; codons10-22, 25-32, 33-40, 40-52, 52-64, 82-94, 97-110, 112-125, 123-125,126-132, 132-142, 150-163, 167-177, 175-186, 187-195, 195-206, 207-219,219-224, 226-237, 232-245, 248-261, 261-268, 272-283, 279-290, 298-307,307-314, 323-331, 333-344, 344-355, 367-375, and 374-386 of TP53, andcombinations thereof.

In some embodiments of identifying the presence of a cancer in a humansubject, the one or more protein biomarkers include CA19-9. In someembodiments, the reference level of the CA19-9 protein biomarker is 92U/mL. In some embodiments, the one or more protein biomarkers includeCEA. In some embodiments, the reference level of the CEA proteinbiomarker is 7.5 ng/mL. In some embodiments, the one or more proteinbiomarkers include HGF. In some embodiments, the reference level of theHGF protein biomarker is 0.89 ng/mL. In some embodiments, the one ormore protein biomarkers include OPN. In some embodiments, the referencelevel of the OPN protein biomarker is 158 ng/mL. In some embodiments,the one or more protein biomarkers include CA125. In some embodiments,the reference level of the CA125 protein biomarker is 577 U/mL. In someembodiments, the one or more protein biomarkers include AFP. In someembodiments, the reference level of the AFP protein biomarker is 21ng/mL. In some embodiments, the one or more protein biomarkers includeprolactin. In some embodiments, the reference level of the prolactinprotein biomarker is 145 ng/mL. In some embodiments, the one or moreprotein biomarkers include TIMP-1. In some embodiments, the referencelevel of the TIMP-1 protein biomarker is 177 ng/mL. In some embodiments,the one or more protein biomarkers include follistatin. In someembodiments, the reference level of the follistatin protein biomarker is2 ng/mL In some embodiments, the one or more protein biomarkers includeG-CSF. In some embodiments, the reference level of the G-CSF proteinbiomarker is 800 pg/mL. In some embodiments, the one or more proteinbiomarkers include CA15-3. In some embodiments, the reference level ofthe CA15-3 protein biomarker is 98 U/mL.

In some embodiments of identifying the presence of a cancer in a humansubject, the presence of the cancer in the human subject is identifiedwhen: (i) the presence of one or more genetic alterations in cell-freeDNA derived is detected, and (ii) the detected levels of the one or moreprotein biomarkers are higher than the reference levels of the one ormore protein biomarkers.

In some embodiments of identifying the presence of a cancer in a humansubject, the presence of one or more genetic alterations in cell-freeDNA in the first biological sample is detected by amplifying thecell-free DNA to form families of amplicons in which each member of afamily is derived from a single template molecule in the cell-free DNA,wherein each member of a family is marked by a common oligonucleotidebarcode, and wherein each family is marked by a distinct oligonucleotidebarcode. In some embodiments, the oligonucleotide barcode is introducedinto the template molecule by a step of amplifying with a population ofprimers which collectively contain a plurality of oligonucleotidebarcodes. In some embodiments, the oligonucleotide barcode is endogenousto the template molecule, and an adapter comprising a DNA synthesispriming site is ligated to an end of the template molecule adjacent tothe oligonucleotide barcode.

In some embodiments of identifying the presence of a cancer in a humansubject, a therapeutic intervention is administered to the subject whenthe presence of cancer is identified. In some embodiments, thetherapeutic intervention is selected from the group consisting of:adoptive T cell therapy, radiation therapy, surgery, administration of achemotherapeutic agent, administration of an immune checkpointinhibitor, administration of a targeted therapy, administration of akinase inhibitor, administration of a signal transduction inhibitor,administration of a bispecific antibody, administration of a monoclonalantibody, and combinations thereof. In some embodiments, cancer, andwherein the therapeutic intervention is more effective than if thetherapeutic intervention were to be administered to a human subject at alater time.

In some embodiments of identifying the presence of a cancer in a humansubject, the presence of cancer in the human subject is detected at atime prior to diagnosis of the human subject with cancer. In someembodiments, the presence of cancer in the human subject is detected ata time prior to the human subject exhibiting symptoms associated withcancer.

In some embodiments of identifying the presence of a cancer in a humansubject, the human subject is human subjected to a radiologic scanningof an organ or body region to identify the location of the cancer. Insome embodiments, the human subject is human subjected to whole bodyradiologic scanning to identify the location of the cancer. In someembodiments, the scanning is a Positron emission tomography-computedtomography (PET-CT) scan.

In some embodiments, the cancer is selected from the group consistingof: pancreatic cancer, colon cancer, esophageal cancer, stomach cancer,ovarian cancer, liver cancer, lung cancer, and breast cancer, andcombinations thereof.

Also provided herein are methods for identifying the presence of cancerin a human subject comprising: detecting a level of one or more proteinbiomarkers in a first biological sample isolated from the human subject,wherein the one or more protein biomarkers are selected from the groupconsisting of: carbohydrate antigen 19-9 (CA19-9), carcinoembryonicantigen (CEA), hepatocyte growth factor (HGF), osteopontin (OPN), CA125,AFP, prolactin, TIMP-1, follistatin, G-CSF, and CA15-3, and combinationsthereof; comparing the detected levels of the one or more proteinbiomarkers to one or more reference levels of the protein biomarkers;and identifying the presence of cancer in the human subject when thedetected levels of the one or more protein biomarkers are higher thanthe reference levels of the one or more protein biomarkers.

Provided herein are methods for identifying the presence of pancreaticcancer in a human subject comprising: detecting in a first biologicalsample isolated from the human subject the presence of one or moregenetic alterations in cell-free DNA derived from a gene selected fromthe group consisting of: KRAS, TP53, CDKN2A, SMAD4, and combinationsthereof detecting a level of one or more protein biomarkers in a secondbiological sample isolated from the human subject, wherein the one ormore protein biomarkers are selected from the group consisting of:carbohydrate antigen 19-9 (CA19-9), carcinoembryonic antigen (CEA),hepatocyte growth factor (HGF), osteopontin (OPN), and combinationsthereof; comparing the detected levels of the one or more proteinbiomarker to one or more reference levels of the protein biomarkers; andidentifying the presence of pancreatic cancer in the human subject whenthe presence of one or more genetic alterations in the cell-free DNA isdetected, the detected levels of the one or more protein biomarkers arehigher than the reference levels of the one or more protein biomarkers,or both. In some embodiments, the first biological sample comprisesblood, plasma, urine, cerebrospinal fluid, saliva, sputum,broncho-alveolar lavage, bile, lymphatic fluid, cyst fluid, stool,ascites, and combinations thereof. In some embodiments, the firstbiological sample, the second biological sample, or both comprisesplasma. In some embodiments, the first and second biological samples arethe same.

In some embodiments of methods for identifying the presence ofpancreatic cancer in a human subject, the one or more geneticalterations occur in codons 12 or 61 of the KRAS gene. In someembodiments, the one or more protein biomarkers include CA19-9. In someembodiments, the reference level of the CA19-9 protein biomarker is 100U/mL. In some embodiments, the one or more protein biomarkers includeCEA. In some embodiments, the reference level of the CEA proteinbiomarker is 7.5 ng/mL. In some embodiments, the one or more proteinbiomarkers include HGF. In some embodiments, the reference level of theHGF protein biomarker is 0.92 ng/mL. In some embodiments, the one ormore protein biomarkers include OPN. In some embodiments, the referencelevel of the OPN protein biomarker is 158 ng/mL. In some embodiments,detecting the level of the one or more protein biomarkers comprisesdetecting the levels of each of CA19-9, CEA, HGF, and OPN; and comparingthe detected levels of the one or more protein biomarkers to one or morereference levels of the protein biomarker comprises comparing thedetected levels of each of CA19-9, CEA, HGF, and OPN to a referencelevel of each of CA19-9, CEA, HGF, and OPN.

In some embodiments of methods for identifying the presence ofpancreatic cancer in a human subject, the presence of pancreatic cancerin the human subject is identified when: (i) the presence of one or moregenetic alterations in cell-free DNA derived from the KRAS gene aredetected, and (ii) the detected levels of the one or more proteinbiomarkers are higher than the reference levels of the one or moreprotein biomarkers. In some embodiments of methods for identifying thepresence of pancreatic cancer in a human subject, the one or moregenetic alterations are detected by amplifying the cell-free DNA to formfamilies of amplicons in which each member of a family is derived from asingle template molecule in the cell-free DNA, wherein each member of afamily is marked by a common oligonucleotide barcode, and wherein eachfamily is marked by a distinct oligonucleotide barcode. In someembodiments, the oligonucleotide barcode is introduced into the templatemolecule by a step of amplifying with a population of primers whichcollectively contain a plurality of oligonucleotide barcodes. In someembodiments, the oligonucleotide barcode is endogenous to the templatemolecule, and an adapter comprising a DNA synthesis priming site isligated to an end of the template molecule adjacent to theoligonucleotide barcode.

In some embodiments, a therapeutic intervention to the human subjectwhen the presence of pancreatic cancer is identified. In someembodiments, the therapeutic intervention is selected from the groupconsisting of: adoptive T cell therapy, radiation therapy, surgery,administration of a chemotherapeutic agent, administration of an immunecheckpoint inhibitor, administration of a targeted therapy,administration of a kinase inhibitor, administration of a signaltransduction inhibitor, administration of a bispecific antibody,administration of a monoclonal antibody, and combinations thereof. Insome embodiments, the therapeutic intervention is administered at a timewhen the human subject has an early-stage pancreatic cancer, and whereinthe therapeutic intervention is more effective than if the therapeuticintervention were to be administered to a human subject at a later time.

In some embodiments, the presence of pancreatic cancer in the humansubject is detected at a time prior to diagnosis of the human subjectwith pancreatic cancer. In some embodiments, the presence of pancreaticcancer in the human subject is detected at a time prior to the humansubject exhibiting symptoms associated with pancreatic cancer.

Also provided herein are methods for identifying the presence ofpancreatic cancer in a human subject comprising: detecting a level ofone or more protein biomarkers in a first biological sample isolatedfrom the human subject, wherein the one or more protein biomarkers areselected from the group consisting of: carbohydrate antigen 19-9(CA19-9), carcinoembryonic antigen (CEA), hepatocyte growth factor(HGF), osteopontin (OPN), and combinations thereof; comparing thedetected levels of the one or more protein biomarker to one or morereference levels of the protein biomarkers; and identifying the presenceof pancreatic cancer in the human subject when the detected levels ofthe one or more protein biomarkers are higher than the reference levelsof the one or more protein biomarkers.

Provided herein are methods for identifying the presence of a cancer ina human subject comprising: detecting in a first biological sampleisolated from the subject the presence of one or more geneticalterations in cell-free DNA derived from a gene selected from the groupconsisting of: AKT1, APC, BRAF, CDKN2, CTNNB1, FBXW7, FGFR2, GNAS, HRAS,KRAS, PPP2R1A, TP53, PTEN, PIK3CA, EGFR and NRAS, and combinationsthereof; detecting in a second biological sample isolated from thesubject the absence of one or more genetic alterations in DNA derivedfrom a gene selected from the group consisting of: AKT1, APC, BRAF,CDKN2, CTNNB1, FBXW7, FGFR2, GNAS, HRAS, KRAS, PPP2R1A, TP53, PTEN,PIK3CA, EGFR and NRAS, and combinations thereof, wherein the secondbiological sample is isolated from white blood cells of the subject;identifying the presence of the cancer in the subject when the one ormore genetic alterations that are detected in the first sample are notdetected in the second sample. In some embodiments, the first biologicalsample, the second biological sample, or both comprise blood, plasma,urine, cerebrospinal fluid, saliva, sputum, broncho-alveolar lavage,bile, lymphatic fluid, cyst fluid, stool, ascites, and combinationsthereof. In some embodiments, the first biological sample, the secondbiological sample, or both comprises plasma. In some embodiments, thefirst and second biological samples are the same. In some embodiments ofidentifying the presence of a cancer in a human subject, the methodsfurther comprise detecting a level of one or more protein biomarkers ina third biological sample isolated from the subject, wherein the proteinbiomarker is selected from the group consisting of: carbohydrate antigen19-9 (CA19-9), carcinoembryonic antigen (CEA), hepatocyte growth factor(HGF), osteopontin (OPN), CA125, AFP, prolactin, follistatin, G-CSF, andCA15-3 and combinations thereof; comparing the detected levels of theone or more protein biomarkers to one or more reference levels of theprotein biomarkers; and identifying the presence of the cancer in thesubject when the presence of one or more genetic alterations in thecell-free DNA is detected in the first sample, the absence of one ormore genetic alterations is detected in DNA from the second sample, andthe detected levels of the one or more protein biomarkers are higherthan the reference levels of the one or more protein biomarkers. In someembodiments, the third biological sample is the same as the firstbiological sample.

In some embodiments of identifying the presence of a cancer in a humansubject, the step of detecting the presence of one or more geneticalterations in cell-free DNA in the first biological sample, the step ofdetecting the absence of one or more genetic alterations in DNA in thesecond biological sample, or both comprises amplifying an ampliconcomprising codons and their surrounding splice sites, wherein the codonsare selected from the group consisting of: codons 16-18 of AKT1; codons1304-1311 or 1450-1459 of APC; codons 591-602 of BRAF; codons 51-58 or76-88 of CDKN2A; codons 31-39 or 38-47 of CTNNB1; codons 856-868 ofEGFR; codons 361-371, 464-473, 473-483, or 498-507 of FBXW7; codons250-256 of FGFR2; codons 199-208 of GNAS; codons 7-19 of HRAS; codons7-14, 57-65, or 143-148 of KRAS; codons 3-15 or 54-63 of NRAS; codons80-90, 343-348, 541-551, or 1038-1050 of PIK3CA; codons 175-187 ofPPP2R1A; and codons 90-98, 125-132, 133-146, 145-154 of PTEN; and codons10-22, 25-32, 33-40, 40-52, 52-64, 82-94, 97-110, 112-125, 123-125,126-132, 132-142, 150-163, 167-177, 175-186, 187-195, 195-206, 207-219,219-224, 226-237, 232-245, 248-261, 261-268, 272-283, 279-290, 298-307,307-314, 323-331, 333-344, 344-355, 367-375, or 374-386 of TP53. In someembodiments, the step of detecting the presence of one or more geneticalterations in cell-free DNA in the first biological sample, the step ofdetecting the absence of one or more genetic alterations in DNA in thesecond biological sample, or both comprises sequencing gene regionscomprising codons and their surrounding splice sites, wherein the codonsare selected from the group consisting of: codons 16-18 of AKT1; codons1304-1311 or 1450-1459 of APC; codons 591-602 of BRAF; codons 51-58 or76-88 of CDKN2A; codons 31-39 or 38-47 of CTNNB1; codons 856-868 ofEGFR; codons 361-371, 464-473, 473-483, or 498-507 of FBXW7; codons250-256 of FGFR2; codons 199-208 of GNAS; codons 7-19 of HRAS; codons7-14, 57-65, or 143-148 of KRAS; codons 3-15 or 54-63 of NRAS; codons80-90, 343-348, 541-551, or 1038-1050 of PIK3CA; codons 175-187 ofPPP2R1A; and codons 90-98, 125-132, 133-146, 145-154 of PTEN; and codons10-22, 25-32, 33-40, 40-52, 52-64, 82-94, 97-110, 112-125, 123-125,126-132, 132-142, 150-163, 167-177, 175-186, 187-195, 195-206, 207-219,219-224, 226-237, 232-245, 248-261, 261-268, 272-283, 279-290, 298-307,307-314, 323-331, 333-344, 344-355, 367-375, or 374-386 of TP53. In someembodiments, the step of detecting the presence of one or more geneticalterations in cell-free DNA in the first biological sample, the step ofdetecting the absence of one or more genetic alterations in DNA in thesecond biological sample, or both comprises sequencing gene regionscomprising codons and their surrounding splice from each of: codons16-18 of AKT1; codons 1304-1311 and 1450-1459 of APC; codons 591-602 ofBRAF; codons 51-58 and 76-88 of CDKN2A; codons 31-39 and 38-47 ofCTNNB1; codons 856-868 of EGFR; codons 361-371, 464-473, 473-483, and498-507 of FBXW7; codons 250-256 of FGFR2; codons 199-208 of GNAS;codons 7-19 of HRAS; codons 7-14, 57-65, and 143-148 of KRAS; codons3-15 and 54-63 of NRAS; codons 80-90, 343-348, 541-551, and 1038-1050 ofPIK3CA; codons 175-187 of PPP2R1A; and codons 90-98, 125-132, 133-146,145-154 of PTEN; and codons 10-22, 25-32, 33-40, 40-52, 52-64, 82-94,97-110, 112-125, 123-125, 126-132, 132-142, 150-163, 167-177, 175-186,187-195, 195-206, 207-219, 219-224, 226-237, 232-245, 248-261, 261-268,272-283, 279-290, 298-307, 307-314, 323-331, 333-344, 344-355, 367-375,and 374-386 of TP53.

In some embodiments of identifying the presence of a cancer in a humansubject, the one or more protein biomarkers include CA19-9. In someembodiments, the reference level of the CA19-9 protein biomarker is 92U/mL. In some embodiments, the one or more protein biomarkers includeCEA. In some embodiments, the reference level of the CEA proteinbiomarker is 7.5 ng/mL. In some embodiments, the one or more proteinbiomarkers include HGF. In some embodiments, the reference level of theHGF protein biomarker is 0.89 ng/mL. In some embodiments, the one ormore protein biomarkers include OPN. In some embodiments, the referencelevel of the OPN protein biomarker is 158 ng/mL. In some embodiments,the one or more protein biomarkers include CA125. In some embodiments,the reference level of the CA125 protein biomarker is 577 U/mL. In someembodiments, the one or more protein biomarkers include AFP. In someembodiments, the reference level of the AFP protein biomarker is 21ng/mL. In some embodiments, the one or more protein biomarkers includeprolactin. In some embodiments, the reference level of the prolactinprotein biomarker is 145 ng/mL. In some embodiments, the one or moreprotein biomarkers include TIMP-1. In some embodiments, the referencelevel of the TIMP-1 protein biomarker is 177 ng/mL. In some embodiments,the one or more protein biomarkers include follistatin. In someembodiments, the reference level of the follistatin protein biomarker is2 ng/mL. In some embodiments, the one or more protein biomarkers includeG-CSF. In some embodiments, the reference level of the G-CSF proteinbiomarker is 800 pg/mL. In some embodiments, the one or more proteinbiomarkers include CA15-3. In some embodiments, the reference level ofthe CA15-3 protein biomarker is 98 U/mL.

In some embodiments of identifying the presence of a cancer in a humansubject, the presence of one or more genetic alterations in cell-freeDNA in the first biological sample, the absence of one or more geneticalterations in DNA in the second biological sample, or both are detectedby amplifying the cell-free DNA to form families of amplicons in whicheach member of a family is derived from a single template molecule inthe cell-free DNA, wherein each member of a family is marked by a commonoligonucleotide barcode, and wherein each family is marked by a distinctoligonucleotide barcode. In some embodiments, the oligonucleotidebarcode is introduced into the template molecule by a step of amplifyingwith a population of primers which collectively contain a plurality ofoligonucleotide barcodes. In some embodiments, the oligonucleotidebarcode is endogenous to the template molecule, and an adaptercomprising a DNA synthesis priming site is ligated to an end of thetemplate molecule adjacent to the oligonucleotide barcode.

In some embodiments, a therapeutic intervention is administered to thesubject when the presence of cancer is identified. In some embodiments,the therapeutic intervention is selected from the group consisting of:adoptive T cell therapy, radiation therapy, surgery, administration of achemotherapeutic agent, administration of an immune checkpointinhibitor, administration of a targeted therapy, administration of akinase inhibitor, administration of a signal transduction inhibitor,administration of a bispecific antibody, administration of a monoclonalantibody, and combinations thereof. In some embodiments, the therapeuticintervention is administered at a time when the subject has anearly-stage cancer, and wherein the therapeutic intervention is moreeffective than if the therapeutic intervention were to be administeredto a subject at a later time.

In some embodiments of identifying the presence of a cancer in a humansubject, the presence of cancer in the subject is detected at a timeprior to diagnosis of the subject with cancer. In some embodiments, thepresence of cancer in the subject is detected at a time prior to thesubject exhibiting symptoms associated with cancer.

In some embodiments of identifying the presence of a cancer in a humansubject, the human subject is human subjected to a radiologic scanningof an organ or body region to identify the location of the cancer. Insome embodiments, the human subject is human subjected to whole bodyradiologic scanning to identify the location of the cancer. In someembodiments, the scanning is a Positron emission tomography-computedtomography (PET-CT) scan.

In some embodiments of identifying the presence of a cancer in a humansubject, the cancer is selected from the group consisting of: pancreaticcancer, colon cancer, esophageal cancer, stomach cancer, ovarian cancer,liver cancer, lung cancer, and breast cancer, and combinations thereof.

As described in more detail herein, it has been shown that DNA fromsampled fluid (e.g., fluid containing cells sampled from theendocervical canal) can be used in an assay (e.g., a PCR-based,multiplex test) to simultaneously assess genetic alterations thatcommonly occur in endometrial or ovarian cancers (FIG. 19).Additionally, as described in more detail herein and without limitation,two ways to increase sensitivity were identified. First, intrauterinesampling (with a “Tao brush”) was tested, a method that allows samplecollection closer to the anatomical site of the tumors. Second, in arecent study, it was shown that testing for mutations in both saliva andplasma from the same individual increased the sensitivity of detectinghead and neck tumors (Wang et al., Detection of somatic mutations andHPV in the saliva and plasma of patients with head and neck squamouscell carcinomas. Sci Transl Med 7, 293ra104 (2015)). Based on thisprecedent, it was shown that testing for mutations in both the plasmaand Pap test fluid can increase sensitivity for cancers (e.g., ovariancancers).

In some aspects, provided herein are methods of detecting endometrialand ovarian cancers based on genetic analyses of DNA recovered fromfluids (e.g., fluids obtained during a routine Papanicolaou (Pap) test).In some embodiments, this new test, called PapSEEK, incorporates assaysfor mutations in one or more of 18 genes and/or an assay for aneuploidy.In some embodiment, method provided herein to detect gynecologic cancersare used at a stage when the cancers are more likely to be curable. Insome embodiments, PapSEEK can be combined with assays for mutations inone or more genes in nucleic acids present in a plasma sample.

In some embodiments, provided herein are methods of detecting ovarian orendometrial cancer in a subject that include detecting in a sampleobtained from the subject the presence of one or more mutations in oneor more genes selected from the group consisting of: NRAS, PTEN, FGFR2,KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7,PIK3R1, APC, EGFR, BRAF, and CDKN2A, detecting in the sample thepresence of aneuploidy, or both, wherein the presence of one or moremutations in NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A,MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, or CDKN2A, thepresence of aneuploidy, or both indicates that the subject has ovarianor endometrial cancer. In some embodiments, the step of detecting thepresence of one or more mutations in NRAS, PTEN, FGFR2, KRAS, POLE,AKT1, TP53, RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC,EGFR, BRAF, CDKN2A is performed using a PCR-based multiplex assay, usinga PCR-based singleplex assay, a digital PCR assay, a droplet digital PCR(ddPCR) assay, a microarray assay, a next-generation sequencing assay, aSanger sequencing assay, a quantitative PCR assay, or a ligation assay.In some embodiments, the step of detecting the presence of one or moremutations in NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53, RNF43, PPP2R1A,MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF, CDKN2A isperformed by increasing the sensitivity of massively parallel sequencinginstruments with an error reduction technique that allows for thedetection of rare mutant alleles in a range of 1 mutant template among100 to 1,000,000 wild-type templates. For example, the step of detectingone or more mutations can be performed by increasing the sensitivity ofmassively parallel sequencing instruments with an error reductiontechnique comprising: a) molecularly assigning a unique identifier (UID)to each template molecule, b) amplifying each uniquely tagged templatemolecule to create UID-families, and c) redundantly sequencing theamplification products. In some embodiments, methods provided hereinfurther include conducting cytology on the sample, wherein presence ofthe one or more mutations in NRAS, PTEN, FGFR2, KRAS, POLE, AKT1, TP53,RNF43, PPP2R1A, MAPK1, CTNNB1, PIK3CA, FBXW7, PIK3R1, APC, EGFR, BRAF,or CDKN2A, the presence of aneuploidy, and/or a positive cytologyindicates that the subject has ovarian or endometrial cancer. In someembodiments, the step of detecting in the sample the presence ofaneuploidy comprises detecting the presence of one or more alterationson one or more of chromosome arms 4p, 7q, 8q, and 9q. In someembodiments, the step of detecting in the sample the presence ofaneuploidy comprises amplifying interspersed nucleotide elements. Insome embodiments, methods provided herein further include detecting in asecond sample comprising circulating tumor DNA (ctDNA) the presence ofat least one mutation in one or more genes selected from the groupconsisting of: AKT1, APC, BRAF, CDKN2A, CTNNB1, EGFR, FBXW7, FGFR2,GNAS, HRAS, KRAS, NRAS, PIK3CA, PPP2R1A, PTEN, and TP53. In someembodiments, the second sample includes plasma. In some embodiments, thesample is collected via intrauterine sampling. In some embodiments, thesample is collected with a Tao brush. In some embodiments, methodsfurther include administering to the subject a therapy (e.g., adjuvantchemotherapy, neoadjuvant chemotherapy, radiation therapy,immunotherapy, targeted therapy, immune checkpoint inhibitors, andcombinations thereof).

In some embodiments, provided herein are methods of detectingendometrial cancer in a subject that include detecting in a sampleobtained from the subject the presence of one or more mutations in oneor more genes selected from the group consisting of: PTEN, TP53, PIK3CA,PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC, FBXW7, RNF43, and PPP2R1A,detecting in the sample the presence of aneuploidy, or both, wherein thepresence of one or more mutations in PTEN, TP53, PIK3CA, PIK3R1, CTNNB1,KRAS, FGFR2, POLE, APC, FBXW7, RNF43, and PPP2R1A, the presence ofaneuploidy, or both indicates that the subject has endometrial cancer.In some embodiments, the step of detecting the presence of one or moremutations in PTEN, TP53, PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC,FBXW7, RNF43, and PPP2R1A is performed using a PCR-based multiplexassay, using a PCR-based singleplex assay, a digital PCR assay, adroplet digital PCR (ddPCR) assay, a microarray assay, a next-generationsequencing assay, a Sanger sequencing assay, a quantitative PCR assay,or a ligation assay. In some embodiments, the step of detecting thepresence of one or more mutations in PTEN, TP53, PIK3CA, PIK3R1, CTNNB1,KRAS, FGFR2, POLE, APC, FBXW7, RNF43, and PPP2R1A is performed byincreasing the sensitivity of massively parallel sequencing instrumentswith an error reduction technique that allows for the detection of raremutant alleles in a range of 1 mutant template among 100 to 1,000,000wild-type templates. For example, the step of detecting one or moremutations can be performed by increasing the sensitivity of massivelyparallel sequencing instruments with an error reduction techniquecomprising: a) molecularly assigning a unique identifier (UID) to eachtemplate molecule, b) amplifying each uniquely tagged template moleculeto create UID-families, and c) redundantly sequencing the amplificationproducts. In some embodiments, methods provided herein further includeconducting cytology on the sample, wherein presence of the one or moremutations in PTEN, TP53, PIK3CA, PIK3R1, CTNNB1, KRAS, FGFR2, POLE, APC,FBXW7, RNF43, and PPP2R1A, the presence of aneuploidy, and/or a positivecytology indicates that the subject has endometrial cancer. In someembodiments, the step of detecting in the sample the presence ofaneuploidy comprises detecting the presence of one or more alterationson one or more of chromosome arms 4p, 7q, 8q, and 9q. In someembodiments, the step of detecting in the sample the presence ofaneuploidy comprises amplifying interspersed nucleotide elements. Insome embodiments, methods provided herein further include detecting in asecond sample comprising circulating tumor DNA (ctDNA) the presence ofat least one mutation in one or more genes selected from the groupconsisting of: AKT1, APC, BRAF, CDKN2A, CTNNB1, EGFR, FBXW7, FGFR2,GNAS, HRAS, KRAS, NRAS, PIK3CA, PPP2R1A, PTEN, and TP53. In someembodiments, the second sample includes plasma. In some embodiments, thesample is collected via intrauterine sampling. In some embodiments, thesample is collected with a Tao brush. In some embodiments, methodsfurther include administering to the subject a therapy (e.g., adjuvantchemotherapy, neoadjuvant chemotherapy, radiation therapy,immunotherapy, targeted therapy, immune checkpoint inhibitors, andcombinations thereof).

In some embodiments, provided herein are methods of detecting ovariancancer in a subject that include detecting in a sample obtained from thesubject the presence of one or more mutations in TP53, detecting in thesample the presence of aneuploidy, or both, wherein the presence of oneor more mutations in TP53, the presence of aneuploidy, or both indicatesthat the subject has ovarian cancer. In some embodiments, the step ofdetecting the presence of one or more mutations in TP53 is performedusing a PCR-based multiplex assay, using a PCR-based singleplex assay, adigital PCR assay, a droplet digital PCR (ddPCR) assay, a microarrayassay, a next-generation sequencing assay, a Sanger sequencing assay, aquantitative PCR assay, or a ligation assay. In some embodiments, thestep of detecting the presence of one or more mutations in TP53 isperformed by increasing the sensitivity of massively parallel sequencinginstruments with an error reduction technique that allows for thedetection of rare mutant alleles in a range of 1 mutant template among100 to 1,000,000 wild-type templates. For example, the step of detectingone or mutations can be performed by increasing the sensitivity ofmassively parallel sequencing instruments with an error reductiontechnique comprising: a) molecularly assigning a unique identifier (UID)to each template molecule, b) amplifying each uniquely tagged templatemolecule to create UID-families, and c) redundantly sequencing theamplification products. In some embodiments, methods provided hereinfurther include conducting cytology on the sample, wherein presence ofthe one or more mutations in TP53, the presence of aneuploidy, and/or apositive cytology indicates that the subject has ovarian cancer. In someembodiments, the step of detecting in the sample the presence ofaneuploidy comprises detecting the presence of one or more alterationson one or more of chromosome arms 4p, 7q, 8q, and 9q. In someembodiments, the step of detecting in the sample the presence ofaneuploidy comprises amplifying interspersed nucleotide elements. Insome embodiments, methods provided herein further include detecting in asecond sample comprising circulating tumor DNA (ctDNA) the presence ofat least one mutation in one or more genes selected from the groupconsisting of: AKT1, APC, BRAF, CDKN2A, CTNNB1, EGFR, FBXW7, FGFR2,GNAS, HRAS, KRAS, NRAS, PIK3CA, PPP2R1A, PTEN, and TP53. In someembodiments, the second sample includes plasma. In some embodiments, thesample is collected via intrauterine sampling. In some embodiments, thesample is collected with a Tao brush. In some embodiments, methodsfurther include administering to the subject a therapy (e.g., adjuvantchemotherapy, neoadjuvant chemotherapy, radiation therapy,immunotherapy, targeted therapy, immune checkpoint inhibitors, andcombinations thereof).

Provided herein are methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject that include: detecting in aurinary sample obtained from the subject the presence of one or moremutations in one or more genes selected from the group consisting of:TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and VHL,detecting in the sample the presence of at least one mutation in a TERTpromoter, and detecting in the sample the presence of aneuploidy,wherein presence of one or more mutations in the group consisting of:TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, and VHL, thepresence of the at least one mutation in the TERT promoter, or thepresence of aneuploidy indicates that the subject has bladder cancer. Insome embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject, the one or more genes are TP53,FGFR3, or both. In some embodiments of methods of detecting bladdercancer or an upper tract urothelial carcinoma in a subject, the one ormore mutations in TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS,MET, or VHL, the one or more mutations in the TERT promoter, or both,are present in 0.03% or fewer of the urinary cells in the sample.

In some embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject, the step of detecting thepresence of one or more mutations in TP53, PIK3CA, FGFR3, KRAS, ERBB2,CDKN2A, MLL, HRAS, MET, VHL, the step of detecting the presence of atleast one mutation in the TERT promoter, or both is performed using aPCR based multiplex assay, a Sanger sequencing assay, a next-generationsequencing assay, a quantitative PCR assay, a droplet digital PCR(ddPCR) assay, or a microarray technique. In some embodiments of methodsof detecting bladder cancer or an upper tract urothelial carcinoma in asubject, the step of detecting the presence of one or more mutations inTP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, VHL, the stepof detecting the presence of at least one mutation in the TERT promoter,or both is performed using a Sanger Sequencing assay. In someembodiments of methods of detecting bladder cancer or an upper tracturothelial carcinoma in a subject, the step of detecting the presence ofone or more mutations in TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL,HRAS, MET, VHL, the step of detecting the presence of at least onemutation in the TERT promoter, or both is performed using a nextgeneration sequencing assay.

In some embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject, the step of detecting in thesample the presence of aneuploidy comprises detecting the presence ofone or more alterations on one or more of chromosome arms 5q, 8q, and9p. In some embodiments of methods of detecting bladder cancer or anupper tract urothelial carcinoma in a subject, the step of detecting thepresence of aneuploidy comprises amplifying long interspersed nucleotideelements (LINESs).

In some embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject, the step of detecting thepresence of one or more mutations in TP53, PIK3CA, FGFR3, KRAS, ERBB2,CDKN2A, MLL, HRAS, MET, VHL, the step of detecting the presence of atleast one mutation in the TERT promoter, or both is performed byincreasing the sensitivity of massively parallel sequencing instrumentswith an error reduction technique that allows for the detection of raremutant alleles in a range of 1 mutant template among 5,000 to 1,000,000wild-type templates. In some embodiments of methods of detecting bladdercancer or an upper tract urothelial carcinoma in a subject, the step ofdetecting the presence of one or more mutations in TP53, PIK3CA, FGFR3,KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, VHL, the step of detecting thepresence of at least one mutation in the TERT promoter, or both isperformed by increasing the sensitivity of massively parallel sequencinginstruments with an error reduction technique that includes: a)assigning a unique identifier (UID) to each template molecule, b)amplifying each uniquely tagged template molecule to createUID-families, and c) redundantly sequencing the amplification products.

In some embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject, the method further includesperforming cytology on the sample, wherein presence of the one or moremutations in TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET,VHL, the presence of the at least one mutation in the TERT promoter, thepresence of aneuploidy, or a positive cytology indicates that thesubject has bladder cancer.

In some embodiments of methods of detecting bladder cancer in a subject,the method further includes administering transuretral resection of thebladder (TURB), intravesical BCG (Bacillus Calmette-Guerin),intravesical chemotherapy, adjuvant chemotherapy, neoadjuvantchemotherapy, cystectomy or cystoprostatectomy, radiation therapy,immunotherapy, immune checkpoint inhibitors, or any combination thereof.

In some embodiments of methods of detecting an upper tract urothelialcarcinoma in a subject, the method further includes administeringtransurethral resection, intravesical BCG (Bacillus Calmette-Guerin),intravesical chemotherapy, adjuvant chemotherapy, neoadjuvantchemotherapy, ureterectomy or nephroureterectomy, radiation therapy,immunotherapy, immune checkpoint inhibitors, or any combination thereof.

In some embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject, the cancer is a low-gradetumor. In some embodiments of methods of detecting bladder cancer or anupper tract urothelial carcinoma in a subject in which the cancer is alow-grade tumor, the low-grade tumor is a papillary urothelial neoplasmsof low malignant potential (PUNLMP) or a non-invasive low gradepapillary urothelial carcinoma. In some embodiments of methods ofdetecting bladder cancer or an upper tract urothelial carcinoma in asubject in which the cancer is a low-grade tumor, the method furtherincludes administering transuretral resection of the bladder (TURB).

In some embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject, the subject has previouslyundergone treatment for bladder cancer or an upper tract urothelialcarcinoma. In some embodiments of methods of detecting bladder cancer oran upper tract urothelial carcinoma in a subject in which the subjecthas previously undergone treatment for bladder cancer or an upper tracturothelial carcinoma, the method includes detecting in a urinary sampleobtained from the subject the presence of one or more mutations in TP53,FGFR3, or both. In some embodiments of methods of detecting bladdercancer or an upper tract urothelial carcinoma in a subject in which thesubject has previously undergone treatment for bladder cancer or anupper tract urothelial carcinoma, one or more mutations in TP53, PIK3CA,FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, or VHL, one or moremutations in the TERT promoter, or both, are present in 0.03% or fewerof the urinary cells in the sample. In some embodiments of methods ofdetecting bladder cancer or an upper tract urothelial carcinoma in asubject in which the subject has previously undergone treatment forbladder cancer or an upper tract urothelial carcinoma, the step ofdetecting the presence of one or more mutations in TP53, PIK3CA, FGFR3,KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, VHL, the step of detecting thepresence of at least one mutation in the TERT promoter, or both isperformed using a PCR based multiplex assay. In some embodiments ofmethods of detecting bladder cancer or an upper tract urothelialcarcinoma in a subject in which the subject has previously undergonetreatment for bladder cancer or an upper tract urothelial carcinoma, thestep of detecting the presence of one or more mutations in TP53, PIK3CA,FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, VHL, the step of detectingthe presence of at least one mutation in the TERT promoter, or both isperformed using a Sanger Sequencing assay. In some embodiments ofmethods of detecting bladder cancer or an upper tract urothelialcarcinoma in a subject in which the subject has previously undergonetreatment for bladder cancer or an upper tract urothelial carcinoma, thestep of detecting the presence of one or more mutations in TP53, PIK3CA,FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, VHL, the step of detectingthe presence of at least one mutation in the TERT promoter, or both isperformed using a next generation sequencing assay.

In some embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject in which the subject haspreviously undergone treatment for bladder cancer or an upper tracturothelial carcinoma, the step of detecting in the sample the presenceof aneuploidy comprises detecting the presence of one or morealterations on one or more of chromosome arms 5q, 8q, and 9p. In someembodiments of methods of detecting bladder cancer or an upper tracturothelial carcinoma in a subject in which the subject has previouslyundergone treatment for bladder cancer or an upper tract urothelialcarcinoma, the step of detecting the presence of aneuploidy comprisesamplifying long interspersed nucleotide elements (LINESs).

In some embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject in which the subject haspreviously undergone treatment for bladder cancer or an upper tracturothelial carcinoma, the step of detecting the presence of one or moremutations in TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET,VHL, the step of detecting the presence of at least one mutation in theTERT promoter, or both is performed by increasing the sensitivity ofmassively parallel sequencing instruments with an error reductiontechnique that allows for the detection of rare mutant alleles in arange of 1 mutant template among 5,000 to 1,000,000 wild-type templates.In some embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject in which the subject haspreviously undergone treatment for bladder cancer or an upper tracturothelial carcinoma, the step of detecting the presence of one or moremutations in TP53, PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET,VHL, the step of detecting the presence of at least one mutation in theTERT promoter, or both is performed by increasing the sensitivity ofmassively parallel sequencing instruments with an error reductiontechnique that includes: a) assigning a unique identifier (UID) to eachtemplate molecule, b) amplifying each uniquely tagged template moleculeto create UID-families, and c) redundantly sequencing the amplificationproducts.

In some embodiments of methods of detecting bladder cancer or an uppertract urothelial carcinoma in a subject in which the subject haspreviously undergone treatment for bladder cancer or an upper tracturothelial carcinoma, the method further includes conducting cytology onthe sample, wherein presence of the one or more mutations in TP53,PIK3CA, FGFR3, KRAS, ERBB2, CDKN2A, MLL, HRAS, MET, VHL, the presence ofthe at least one mutation in the TERT promoter, the presence ofaneuploidy, or a positive cytology indicates that the subject hasbladder cancer.

In some embodiments of methods of detecting bladder cancer in a subjectin which the subject has previously undergone treatment for bladdercancer or an upper tract urothelial carcinoma, the method furtherincludes administering transuretral resection of the bladder (TURB),intravesical BCG (Bacillus Calmette-Guerin), intravesical chemotherapy,adjuvant chemotherapy, neoadjuvant chemotherapy, cystectomy orcystoprostatectomy, radiation therapy, immunotherapy, immune checkpointinhibitors, or any combinations of the above.

In some embodiments of methods of detecting an upper tract urothelialcarcinoma in a subject in which the subject has previously undergonetreatment for bladder cancer or an upper tract urothelial carcinoma, themethod further includes administering transurethral resection,intravesical BCG (Bacillus Calmette-Guerin), intravesical chemotherapy,adjuvant chemotherapy, neoadjuvant chemotherapy, ureterectomy ornephroureterectomy, radiation therapy, immunotherapy, immune checkpointinhibitors, or any combination thereof.

This document provides methods and materials for identifying one or morechromosomal anomalies (e.g., aneuploidy). In some cases, this documentprovides methods and materials for using amplicon-based sequencing datato identify a mammal as having a disease or disorder associated with oneor more chromosomal anomalies. For example, methods and materialsdescribed herein can be applied to a sample obtained from a mammal toidentify the mammal as having one or more chromosomal anomalies. Forexample, methods and materials described herein can be applied to asample obtained from a mammal to identify the mammal as having a diseaseor disorder associated with one or more chromosomal anomalies. Forexample, a prenatal mammal can be identified as having a disease ordisorder based, at least in part, on the presence of one or moreaneuploidies. This document also provides methods and materials foridentifying and/or treating a disease associated with one or morechromosomal anomalies. In some cases, one or more chromosomal anomaliescan be identified in DNA obtained from a sample obtained from a mammal.For example, a mammal identified as having cancer based, at least inpart, on the presence of one or more chromosomal anomalies can betreated with one or more cancer treatments.

As demonstrated herein, a new approach (called WALDO forWithin-Sample-AneupLoidy-DetectiOn, can be used to evaluate thesequencing data obtained from amplicons to identify the presence of oneor more chromosomal anomalies (e.g., aneuploidy). For example, WALDO canemploy supervised machine learning to detect the small changes inmultiple chromosome arms that are often present in cancers. As describedherein, WALDO was used to search for chromosome arm gains and losses in1,677 tumors as well as in 1,522 liquid biopsies of blood from cancerpatients or normal individuals. Aneuploidy was detected in 95% of cancerbiopsies and in 22% of liquid biopsies. Using single nucleotidepolymorphisms (SNPs) within the amplified interspersed nucleotideelements (LINEs), WALDO concomitantly assessed allelic imbalances,microsatellite instability, and sample identification. WALDO can be usedon samples containing only a few nanograms (ng) of DNA and having aslittle as 1% neoplastic content.

Having the ability to use amplicon-based sequencing reads to detect oneor more chromosomal anomalies provides a unique and unrealizedopportunity to achieve high coverage depth with improved sensitivity atrelatively low cost. Moreover, the ability to use amplicon-basedsequencing reads allows the detection of one or more chromosomalanomalies (e.g., aneuploidies) from samples containing limited amountsof DNA. This approach can be used in a variety of applicationsincluding, but not limited to, diagnostics (e.g., prenatal diagnosticsand/or cancer diagnostics) and forensic science.

In general, one aspect of this document features a method for detectinganeuploidy in a genome of a mammal. The method includes, or consistsessentially of, sequencing a plurality of amplicons obtained from asample obtained from the mammal to obtain sequencing reads; grouping thesequencing reads into clusters of genomic intervals; calculating sums ofdistributions of the sequencing reads in each genomic interval using theequation Σ₁ ^(I)R_(i)˜N(Σ₁ ^(I)μ_(i), Σ₁ ^(I)σ_(i) ²), where R_(i) isthe number of sequencing reads, I is the number of clusters on achromosome arm, N is a Gaussian distribution with parameters μ_(i) andσ_(i) ², where is the mean number of sequencing reads in each genomicinterval, and where σ_(i) ² is the variance of sequencing reads in eachgenomic interval; calculating a Z-score of a chromosome arm using thequantile function 1−CDF(Σ₁ ^(I)μ_(i), Σ₁ ^(I)σ_(i) ²); and identifyingthe presence of an aneuploidy in the genome of the mammal when theZ-score is outside a significance threshold. The plurality of ampliconscan include from about 10,000 amplicons to about 1,000,000 amplicons(e.g., the plurality of amplicons can include about 38,000 amplicons).The genomic intervals can include from about 100 nucleotides to about125,000,000 nucleotides (e.g., the genomic intervals can include about500,000 nucleotides). The mammal can be a human. The sample can be aliquid biopsy. The liquid biopsy can be blood, urine, saliva, cystfluid, sputum, tissue, stool, pap smears, or cerebral spinal fluid. Theliquid biopsy can be a blood sample (e.g., a plasma sample). The bloodsample can include cell free fetal DNA. The sample can include aneoplastic cell fraction. The neoplastic cell fraction in the sample caninclude less than about 1% (e.g., less than about 0.5%) of the entiresample. The amplicons can include unique long interspersed nucleotideelements (LINEs). The sequencing step also can include amplifying DNAfrom the sample obtained from the mammal to obtain the amplicons. Theamplifying can be performed using a single primer pair. The ampliconscan include about 100 to about 140 base pairs. Each amplicon can besequenced between 1 and 20 times. The method can include about 100,000to about 25 million sequencing reads. Each cluster can include about twohundred genomic intervals. The method also can include supervisedmachine learning. The supervised machine learning can employ a supportvector machine model.

In another aspect, this document features a method for detecting one ormore polymorphisms in a genome of a mammal. The method includes, orconsists essentially of, sequencing a plurality of amplicons obtainedfrom a sample from the mammal to obtain variant sequencing reads;sequencing a plurality of amplicons obtained from a reference samplefrom the mammal to obtain reference sequencing reads; grouping thevariant sequencing reads and the reference sequencing reads intoclusters of genomic intervals; selecting a chromosome arm having a sumof the variant sequencing reads and the reference sequencing reads onboth alleles that is greater than about 3; determining a variant-allelefrequency (VAF) of the selected chromosome arm, wherein said VAF is thenumber of variant sequencing reads/total number of sequencing reads; andidentifying the presence of one or more polymorphisms on said selectedchromosome arm of the mammal if the VAF is between about 0.2 and about0.8. The sequencing step can include assigning a unique identifier (UID)to each amplicon, amplifying each uniquely tagged amplicon to createUID-families, and redundantly sequencing the amplification products. Thesequencing step can further include calculating a Z-score of a varianton said selected chromosome arm using the equation

${Z \sim \frac{\sum\limits_{i = 1}^{k}{w_{i}Z_{i}}}{\sqrt{\sum\limits_{i = 1}^{k}w_{i}^{2}}}},$

where w_(i) is UID depth at a variant i, Z_(i) is the Z-score of varianti, and k is the number of variants observed on the chromosome arm. Theone or more polymorphisms can include single base substitutions,insertions, deletions, indels, and/or combinations thereof. Thepluralities of amplicons can include from about 10,000 amplicons toabout 1,000,000 amplicons (e.g., the pluralities of amplicons caninclude about 38,000 amplicons). The genomic intervals can include fromabout 100 nucleotides to about 125,000,000 nucleotides (e.g., thegenomic intervals can include about 500,000 nucleotides). The mammal canbe a human. The sample can be a liquid biopsy. The liquid biopsy can beblood, urine, saliva, cyst fluid, sputum, tissue, stool, pap smears, orcerebral spinal fluid. The liquid biopsy can be a blood sample (e.g., aplasma sample). The blood sample can include cell free fetal DNA. Thesample can include a neoplastic cell fraction. The neoplastic cellfraction in the sample can include less than about 1% (e.g., less thanabout 0.5%) of the entire sample. The amplicons can include unique longinterspersed nucleotide elements (LINEs). The sequencing step also caninclude amplifying DNA from the sample obtained from the mammal toobtain the amplicons. The amplifying can be performed using a singleprimer pair. The amplicons can include about 100 to about 140 basepairs. Each amplicon can be sequenced between 1 and 20 times. The methodcan include about 100,000 to about 25 million sequencing reads. Eachcluster can include about two hundred genomic intervals. The method alsocan include supervised machine learning. The supervised machine learningcan employ a support vector machine model.

In some embodiments, provided herein are methods of evaluating a subjectfor the presence of a cancer. The methods can include, or consistessentially of, detecting one or more genetic biomarkers in a biologicalsample containing DNA obtained from the subject, where the one or moregenetic biomarkers are mutations, where the genetic biomarkers arepresent in at least four driver genes, and where each driver gene isassociated with the cancer. The detecting one or more genetic biomarkerscan include sequencing a plurality of regions of interest within thedriver genes, where each region of interest contains at least onegenetic biomarker. The sensitivity of the method can be at least 70%,where detection of additional genetic biomarkers in additional regionsof interest does not substantially increase the sensitivity of themethod. The detecting step can include determining the sequences of thegenetic biomarkers within the regions of interest. Determining thesequences of additional genetic biomarkers does not substantiallyincrease the sensitivity of the method. The detecting step can includeproviding the sequences of the regions of interest. Providing thesequences of additional regions of interest does not substantiallyincrease the sensitivity of the method. The detecting step can includeamplifying each region of interest by PCR to generate a plurality ofamplicons. Amplification of additional regions of interest to generateadditional amplicons does not substantially increase the sensitivity ofthe method. Amplification of additional regions of interest to generateadditional amplicons does not substantially decrease the specificity ofthe method. Detection of additional genetic biomarkers in additionalregions of interest can increase the probability of a false-positiveresult. The cancer can be lung cancer, pancreatic cancer, liver cancer,esophageal cancer, stomach cancer, head and neck cancer, ovarian cancer,colorectal cancer, bladder cancer, cervical cancer, uterine cancer,endometrial cancer, kidney cancer, breast cancer, prostate cancer, braincancer, or sarcoma. The cancer can be liver cancer, ovarian cancer,esophageal cancer, stomach cancer, pancreatic cancer, colorectal cancer,lung cancer, breast cancer, or prostate cancer. The cancer can be livercancer, ovarian cancer, esophageal cancer, stomach cancer, pancreaticcancer, colorectal cancer, lung cancer, or breast cancer. The method canbe used to evaluate the subject for a plurality of cancers. Theplurality can include at least 4 cancers. The plurality can include 5 to8 cancers. The cancer can be a blood cancer. No more than 70 regions ofinterest can be sequenced. At least 30 regions of interest can besequenced. From 30 to 70 regions of interest can be sequenced. About 60regions of interest can be sequenced. The number of regions of interestsequenced can be no greater than 125% of the lowest number that achievesplateau for sensitivity of detection of cancer. Each region of interestcan include no more than 800 bp. Each region of interest can include atleast 6 bp. Each region of interest comprises from 6 bp to 800 bp. Eachregion of interest comprises 14 bp to 42 bp. Each region of interest canbe PCR-amplified to generate a plurality of amplicons. Each amplicon caninclude no more than 800 bp. Each amplicon can include at least 6 bp.Each amplicon can include from 6 bp to 800 bp. Each amplicon can includefrom 66 bp to 80 bp. The detecting step can include sequencing no morethan 20,000 bp. The detecting step can include sequencing at least 200bp. The detecting step can include sequencing from 200 bp to 20,000 bp.The detecting step can include sequencing 2000±15% bp. The detectingstep can include sequencing about 2,000 bp. The detecting step caninclude sequencing each region of interest with at least 5× sequencingdepth. The detecting step can include sequencing each region of interestwith no more than 500× sequencing depth. The detecting step can includesequencing each region of interest with from 5× to 500× sequencingdepth. The detecting step can include sequencing each region of interestto a depth of at least 50,000 reads per base. The detecting step caninclude sequencing each region of interest to a depth of no more than150,000 reads per base. The detecting step can include sequencing eachregion of interest to a depth of from 50,000 reads per base to 150,000reads per base. The detecting step can include sequencing each region ofinterest at a depth sufficient to detect a mutation in said region ofinterest at a frequency as low as 0.0005%. The detecting step caninclude sequencing no more than 300 bp of each region of interest. Thedetecting step can include sequencing at least 6 bp of each region ofinterest. The detecting step can include sequencing from 6 bp to 300 bpof each region of interest. The detecting step can include sequencingabout 33 bp of each region of interest. The method also can includedetecting a level of one or more peptide biomarkers in the biologicalsample, where an elevated level of each protein biomarker is associatedwith the cancer. For example, the method also can include comparing thedetected levels of each peptide biomarker to a reference level for thepeptide biomarker, and identifying the presence of a cancer in thesubject when an elevated level of at least one peptide biomarker isdetected. The subject can have not been determined to have a cancer. Thesubject can have not been determined to harbor a cancer cell. Thesubject can have not exhibited a symptom associated with a cancer. Thesubject can be a pediatric subject. The subject can be an adult subject.The sample can be a tumor sample. The sample can be a liquid sample.When the sample is a liquid sample, the liquid sample can be a bloodsample, and the DNA can be circulating tumor DNA or cell-free DNA. Theat least four driver genes can include at least four of the genes setforth in Tables 60 and 61. The at least four driver genes can beselected NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN,FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, GNAS, and combinations thereof.The at least four driver genes can include from 5 to 16 genes. The atleast four driver genes can be selected from KRAS, PIK3CA, HRAS, CDKN2A,TP53, AKT1, CTNNB1, APC, EGFR, GNAS, PPP2R1A, BRAF, FBXM7, PTEN, FGFR2,and combinations thereof; and the cancer can be selected from livercancer, ovarian cancer, esophageal cancer, stomach cancer, pancreaticcancer, colorectal cancer, lung cancer, breast cancer, and prostatecancer. The at least four driver genes can be selected from KRAS,PIK3CA, HRAS, CDKN2A, TP53, TERT, ERBB2, FGFR3, MET, MLL, VHL, andcombinations thereof; and the cancer can be selected from a bladdercancer and an UTUC. The at least four driver genes can be selected fromKRAS, PIK3CA, CDKN2A, TP53, CTNNB1, PPP2R1A, BRAF, PTEN, CSMD3, FAT3,BRCA, ARID1A, and combinations thereof; and the cancer can be selectedfrom an ovarian cancer and an endometrial cancer. The at least fourdriver genes can be selected from KRAS, PIK3CA, CDKN2A, TP53, CTNNB1,GNAS, BRAF, NRAS, VHL, RNF43, SMAD4, and combinations thereof; and thecancer can be a pancreatic cancer. The one or more peptide biomarkerscan include from 5 to 8 peptide biomarkers. The one or more peptidebiomarkers can be selected from CA19-9, CEA, HGF, OPN, CA125, prolactin,TIMP-1, MPO, and combinations thereof. The PCR-amplification can includeassigning a UID to each region of interest, amplifying each region ofinterest with its assigned UID to generate uniquely tagged UID-familiesof amplification products, and redundantly sequencing the amplificationproducts. The method also can include detecting the presence ofaneuploidy in the biological sample containing DNA. Detecting thepresence of aneuploidy can include estimating somatic mutation load,estimating carcinogen signature, and/or detecting MSI. Detecting thepresence of aneuploidy can include comparing the estimated somaticmutation load, the estimated carcinogen signature, and/or the detectedMSI to a reference level of somatic mutation load, carcinogen signature,and/or MSI. Detecting the presence of aneuploidy can increase thespecificity and/or sensitivity of the method. The presence of aneuploidycan be detected on one or more of chromosome arms 4p, 7q, 8q, and/or 9q.The method also can include determining the cancer type and/or theorigin of the cancer.

In some embodiments, provided herein are methods of evaluating a subjectfor the presence of any of a plurality of cancers in a subject. Themethods can include, or consist essentially of, detecting in abiological sample obtained from the subject the presence of one or moredriver gene mutations, in each of one or more driver genes, where eachdriver gene is associated with the presence of a cancer in the pluralityof cancers; thereby evaluating the subject for the presence of any ofthe plurality of cancers. The number of driver gene mutations detectedcan be sufficient such that the sensitivity of detection of the cancerin the plurality of cancers with which each driver gene is associatedwith is not substantially increased by the detection of one or moreadditional driver gene mutations. The detecting the one or more drivergene mutations can include providing a sequence of the one or moredriver gene mutations. The detecting the one or more driver genemutations can include sequencing one or more subgenomic intervals oramplicons that include the driver gene mutation. The number ofsubgenomic intervals or amplicons sequenced is sufficient such that thesensitivity of detection of the cancer in the plurality of cancers withwhich each driver gene is associated with is not substantially increasedby sequencing one or more additional subgenomic intervals or amplicons.The plurality of cancers can include 4, 5, 6, 7 or 8 cancers. Theplurality of cancers can be chosen from two or more of liver cancer,ovarian cancer, esophageal cancer, stomach cancer, pancreatic cancer,colorectal cancer, lung cancer, breast cancer, and prostate cancer. Atleast 30 and not more than 400 subgenomic intervals or amplicons fromthe driver genes can be sequenced. No more than 150 subgenomic intervalsor amplicons from the driver genes can be sequenced. Each subgenomicinterval or amplicon can include 6-800 bp. Each subgenomic interval oramplicon can include at least 500 bp and no more than 3000 bp. Eachsubgenomic interval or amplicon can include 2000 bp±15%. At least 6 bpand no more than 300 bp in each driver gene can be sequenced. Thesubject has not yet been determined to have a cancer. The subject hasnot yet been determined to harbor a cancer cell. The subject does notexhibit, or has not exhibited, a symptom associated with a cancer. Thedriver gene can be chosen from a gene disclosed in Table 60 or 61. Theone or more driver genes can include 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, or 16 genes chosen from Tables 60 and 61. The one ormore driver gene can include KRAS, PIK3CA, HRAS, CDKN2A, TP53, AKT1,CTNNB1, APC, EGFR, GNAS, PPP2R1A, BRAF, FBXM7, PTEN, or FGFR2, or acombination thereof. The cancer of any of the plurality of cancers canbe chosen from liver cancer, ovarian cancer, esophageal cancer, stomachcancer, pancreatic cancer, colorectal cancer, lung cancer, breastcancer, and prostate cancer. The method also can include detecting thelevel of each of one or more protein biomarkers in the biologicalsample, where the level of each protein biomarker is associated with thepresence of a cancer of the plurality of cancers. In some cases, themethod also can include comparing the detected levels of each proteinbiomarker to a reference level for the protein biomarker, andidentifying the presence of a cancer of the plurality of cancers in thesubject when the presence of one or more protein biomarkers is detected.The biological sample can be a tumor sample, a circulating tumor DNAsample, a solid tumor biopsy sample, or a fixed tumor sample. Thebiological sample can be a blood sample. The biological sample can be anapheresis sample. The biological sample can be a cell-free DNA sample.The biological sample can be a first biological sample and can include aDNA sample. The DNA sample can include cell-free DNA or circulatingtumor DNA. The biological sample can be a second biological sample andcan include a protein sample. The protein biomarker can include one ormore of a protein biomarker selected from CA19-9, CEA, HGF, OPN, CA125,prolactin, TIMP-1, or MPO, and combinations thereof. The detecting thepresence of one or more driver gene mutation can include assigning a UIDto each of a plurality of template molecules present in the sample,amplifying each uniquely tagged template molecule to createUID-families, and redundantly sequencing the amplification products.

In some embodiments, provided herein are methods of evaluating a subjectfor the presence of any of a plurality of cancers in a subject. Themethods can include, or consist essentially of, (a) detecting in abiological sample obtained from the subject the presence of one or moredriver gene mutations, in each of one or more driver genes, where one ormore of the driver genes is chosen from KRAS, PIK3CA, HRAS, CDKN2A,TP53, AKT1, CTNNB1, APC, EGFR, GNAS, PPP2R1A, BRAF, FBXM7, PTEN, orFGFR2, and combinations thereof, and where each driver gene isassociated with the presence of a cancer in the plurality of cancers;and (b) detecting the level of each of one or more protein biomarkers ina biological sample, where the one or more protein biomarkers is chosenfrom CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, or MPO, andcombinations thereof, and where the level of each protein biomarker isassociated with the presence of a cancer of the plurality of cancers;thereby evaluating the subject for the presence of any of the pluralityof cancers, where the presence of a cancer of the plurality of cancersis identified when the presence of one or more driver gene mutations andthe level of one or more of the protein biomarkers is detected. Thenumber of driver gene mutations detected can be sufficient such that thesensitivity of detection of the cancer in the plurality of cancers withwhich each driver gene is associated with, is not substantiallyincreased by the detection of one or more additional driver genemutations.

Additional features of any of the methods disclosed herein include oneor more of the following enumerated embodiments.

Those skilled in the art will recognize, or be able to ascertain usingno more than routine experimentation, many equivalents to the specificembodiments of the invention described herein. Such equivalents areintended to be encompassed by the following enumerated embodiments.

E1. A method of evaluating a subject for the presence of any of aplurality of, e.g., any of at least four, cancers in the subjectcomprising:

detecting in a biological sample obtained from the subject, e.g., acell-free DNA sample, the presence of one or more genetic biomarkers,e.g., one or more mutations (e.g., one or more driver gene mutations),in each of one or more genes (e.g., one or more driver genes, e.g., inat least four driver genes), wherein each gene, e.g., driver gene, isassociated with the presence of a cancer of the plurality of cancers,

thereby evaluating the subject for the presence of any of the pluralityof, e.g., any of at least four, cancers, e.g., by sequencing one or moresubgenomic intervals or amplicons comprising the genetic biomarkers,

wherein, the number of biomarkers (e.g., number of driver genemutations) detected is sufficient such that the sensitivity of detectionof the cancer in the plurality of cancers with which each gene, e.g.,driver gene, is associated with, is not substantially increased by thedetection of one or more additional genetic biomarkers.

E2. The method of embodiment E1, wherein detecting the genetic biomarkercomprises providing, e.g., by sequencing, the sequence (e.g., nucleotidesequence) of the genetic biomarker.

E3. The method of embodiment E2, wherein the number of genetic biomarkersequences provided is sufficient such that the sensitivity of detectionof the cancer in the plurality of cancers with which each gene, e.g.,driver gene, is associated with is not substantially increased by theprovision of one or more sequences of additional genetic biomarkers.

E4. The method of embodiment E1, wherein detecting the biomarkercomprises providing the sequence (e.g., nucleotide sequence) of one ormore subgenomic intervals comprising the genetic biomarker.

E5. The method of embodiment E4, wherein, the number of subgenomicinterval sequences provided is sufficient such that the sensitivity ofdetection of the cancer in the plurality of cancers with which eachgene, e.g., driver gene, is associated with is not substantiallyincreased by the provision of one or more sequences (e.g., nucleotidesequences) of additional subgenomic intervals.

E6. The method of embodiment E1, wherein detecting the genetic biomarkercomprises providing the sequence of an amplicon comprising the geneticbiomarker.

E7. The method of embodiment E6, wherein, the number of ampliconsequences provided is sufficient such that the sensitivity of detectionof the cancer in the plurality of cancers with which each gene, e.g.,driver gene, is associated with is not substantially increased by theprovision of one or more sequences of additional amplicons.

E8. The method of embodiment E4, wherein the number of subgenomicinterval sequences provided is sufficient such that the specificity ofdetection of the cancer in the plurality of cancers with which eachgene, e.g., driver gene, is associated with is not substantiallydecreased by the provision of one or more sequences of additionalsubgenomic intervals.

E9. The method of embodiment E6, wherein the number of ampliconsprovided is sufficient such that the specificity of detection of thecancer in the plurality of cancers with which each gene, e.g., drivergene, of the plurality is associated with is not substantially decreasedby the provision of one or more sequences of additional amplicons.

E8. The method of any of the preceding embodiments, wherein theplurality of cancers comprises 4, 5, 6, 7 or 8 cancers.

E9. The method of any of the preceding embodiments, wherein theplurality of cancers is chosen from solid tumors such as: mesothelioma(e.g., malignant pleural mesothelioma), lung cancer (e.g., non-smallcell lung cancer, small cell lung cancer, squamous cell lung cancer, orlarge cell lung cancer), pancreatic cancer (e.g., pancreatic ductaladenocarcinoma), liver cancer (e.g., hepatocellular carcinoma, orcholangiocarcinoma), esophageal cancer (e.g., esophageal adenocarcinomaor squamous cell carcinoma), head and neck cancer, ovarian cancer,colorectal cancer, bladder cancer, cervical cancer, uterine cancer(endometrial cancer), kidney cancer, breast cancer, prostate cancer,brain cancer (e.g., medulloblastoma, or glioblastoma), or sarcoma (e.g.,Ewing sarcoma, osteosarcoma, rhabdomyosarcoma), or a combinationthereof.

E10. The method of any of the preceding embodiments, wherein theplurality of cancers is chosen from liver cancer, ovarian cancer,esophageal cancer, stomach cancer, pancreatic cancer, colorectal cancer,lung cancer, breast cancer, or prostate cancer, or a combinationthereof.

E11. The method of any of the preceding embodiments, wherein one or moreof the plurality of cancers is chosen from liver cancer, ovarian cancer,esophageal cancer, stomach cancer, pancreatic cancer, colorectal cancer,lung cancer, or breast cancer.

E12. The method of any of the preceding embodiments, wherein one or moreof the plurality of cancers is a hematological cancer.

E13. The method of any of the preceding embodiments, wherein no morethan 60, 100, 150, 200, 300 or 400 subgenomic intervals or ampliconsfrom the one or more genes, e.g., one or more driver genes, e.g., geneslisted in Tables 60 and 61, are sequenced.

E14. The method of any of the preceding embodiments, wherein at least30, 40, 50 or 60 subgenomic intervals or amplicons from the one or moregenes, e.g., one or more driver genes, e.g., genes listed in Tables 60and 61, are sequenced.

E15. The method of any of the preceding embodiments, wherein at least 30and not more than 400, at least 40 and not more than 300, at least 50and no more than 200, at least 60 and no more than 150, or at least 60and no more than 100, subgenomic intervals or amplicons from the one ormore genes, e.g., one or more driver genes, e.g., one or more geneslisted in Tables 60 and 61, are sequenced.

E16. The method of any of the preceding embodiments, wherein the numberof subgenomic intervals or amplicons sequenced for a gene is no greaterthan 125, 150, 200, or 300% of the lowest number that achieves plateaufor sensitivity of detection of the cancer.

E17. The method of any of the preceding embodiments, wherein eachsubgenomic interval or amplicon comprises 6-800 bp, e.g., 6-750 bp,6-700 bp, 6-650 bp, 6-600 bp, 6-550 bp, 6-500 bp, 6-450 bp, 6-400 bp,6-350 bp, 6-300 bp, 6-250 bp, 6-200 bp, 6-150 bp, 6-100 bp, 10-800 bp,15-800 bp, 20-800 bp, 25-800 bp, 30-800 bp, 35-800 bp, 40-800 bp, 45-800bp, 50-800 bp, 55-800 bp, 60-800 bp, 65-800 bp, 70-800 bp, 75-800 bp,80-800 bp, 85-800 bp, 90-800 bp, 95-800 bp, 100-800 bp, 200-800 bp,300-800 bp, 400-800 bp, 500-800 bp, 600-800 bp, 700-800 bp, 10-700 bp,20-600 bp, 30-500 bp, 40-400 bp, 50-300 bp, 60-200 bp, 61-150 bp, 62-140bp, 63-130 bp, 64-120 bp, or 65-100 bp, e.g., 66-80 bp.

E18. The method of any of the preceding embodiments, wherein eachsubgenomic interval or amplicon comprises about 35, 40, 45, 50, 51, 52,53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88,89, 90, 91, 92, 93, 94, 95, 100, or 110 bp.

E19. The method of any of the preceding embodiments, wherein eachsubgenomic interval or amplicon comprises no more than 50, 55, 60, 65,70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, or 800 bp.

E20. The method of any of the preceding embodiments, wherein eachsubgenomic interval or amplicon comprises at least 6, 10, 15, 20, 25,30, 35, 40, 45, or 50 bp.

E21. The method of any of the preceding embodiments, wherein eachsubgenomic interval or amplicon comprises at least 6pb and no more than800 bp, at least 10 bp and no more than 700 bp, at least 15 bp and nomore than 600 bp, at least 20 bp and no more than 600 bp, at least 25 bpand no more than 500 bp, at least 30 bp and no more than 400 bp, atleast 35 bp and no more than 300 bp, at least 40 bp and no more than 200bp, at least 45 bp and no more than 100 bp, at least 50 bp and no morethan 95 bp, or at least 55 bp and no more than 90 bp.

E22. The method of any of the preceding embodiments, wherein eachsubgenomic interval or amplicon comprises 66-80 bp.

E23. The method of any of the preceding embodiments, wherein the numberof subgenomic intervals or amplicons comprises no more than 2000, 2500,3000, 3500, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 15,000, or20,000 bp.

E24. The method of any of the preceding embodiments, wherein the numberof subgenomic intervals or amplicons comprises at least 200, 300, 400,500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700,1800, 1900 or 2000 bp.

E25. The method of any of the preceding embodiments, wherein the numberof subgenomic intervals or amplicons comprises at least 200 bp and nomore than 20,000 bp, at least 300 bp and no more than 15,000 bp, atleast 400 bp and no more than 10,000 bp, at least 500 bp and no morethan 9000, at least 600 bp and no more than 8000 bp, at least 700 bp andno more than 7000 bp, at least 800 bp and no more than 6000 bp, at least900 bp and no more than 5000 bp, at least 1000 bp and no more than 4000bp, at least 1100 bp and no more than 3500 bp, at least 1200 bp and nomore than 3000 bp, at least 1300 bp and no more than 2500 bp, or atleast 1500 bp and no more than 2000 bp.

E26. The method of any of the preceding embodiments, wherein the numberof subgenomic intervals or amplicons comprises 200+15%, 300+15%,400+15%, 500+15%, 600+15%, 700+15%, 800+15%, 900+15%, 1000+15%,1100+15%, 1200+15%, 1300+15%, 1400+15%, 1500+15%, 1600+15%, 1700+15%,1800+15%, 1900+15%, 2000+15%, 2500+15%, 3000+15%, 3500+15%, 4000+15%,5000+15%, 6000+15%, 7000+15%, 8000+15%, 9000+15%, 10,000+15%,15,000+15%, or 20,000 bp+15%, e.g., 2000 bp+15%.

E27. The method of any of the preceding embodiments, wherein the numberof subgenomic intervals or amplicons comprise 2000 bp.

E28. The method of any of the preceding embodiments, wherein the averagedepth to which the number of subgenomic intervals or amplicons issequenced is at least 5× sequencing depth.

E29. The method of any of the preceding embodiments, wherein the averagedepth to which the number of subgenomic intervals or amplicons issequenced is no more than 500× sequencing depth.

E30. The method of any of the preceding embodiments, wherein the averagedepth to which the number of subgenomic intervals or amplicons issequenced is between 5× to 500× sequencing depth.

E31. The method of any of the preceding embodiments, wherein saiddetecting step comprises sequencing each subgenomic interval to a depthof at least 50,000 reads per base.

E32. The method of any of the preceding embodiments, wherein saiddetecting step comprises sequencing each subgenomic interval to a depthof no more than 150,000 reads per base.

E33. The method of any of the preceding embodiments, wherein saiddetecting step comprises sequencing each subgenomic interval to a depthof from 50,000 reads per base to 150,000 reads per base.

E34. The method of any of the preceding embodiments, wherein saiddetecting step comprises sequencing each subgenomic interval at a depthsufficient to detect a mutation in said region of interest at afrequency as low as 0.0005%.

E35. The method of any of the preceding embodiments, wherein no morethan 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 40,45, 50, 55, 60, 100, 200 or 300 bp, is sequenced for each biomarker,e.g., each gene, e.g., each driver gene, e.g., each gene disclosed inTable 60 or 61.

E36. The method of any of the preceding embodiments, wherein at least 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 bp, is sequencedin each biomarker, e.g., each gene, e.g., each driver gene, e.g., eachgene disclosed in Table 60 or 61.

E37. The method of any of the preceding embodiments, wherein at least 6and no more than 300 bp, at least 7 and no more than 200 bp, at least 8bp and no more than 100 bp, at least 9 bp and no more than 60 bp, atleast 10 bp and no more than 55 bp, at least 11 bp and no more than 50bp, at least 12 bp and no more than 45 bp, at least 13 bp and no morethan 40 bp, at least 14 bp and no more than 35 bp, at least 15 bp and nomore than 34 bp, at least 14 bp and no more than 33 bp, at least 15 bpand no more than 32 bp, at least 16 bp and no more than 31 bp, at least17 bp and no more than 30 bp, at least 18 bp and no more than 29 bp, atleast 19 bp and no more than 28 bp, at least 20 bp and no more than 27bp, is sequenced in each biomarker, e.g., each gene, e.g., each drivergene, e.g., each gene disclosed in Table 60 or 61.

E38. The method of any of the preceding embodiments, wherein about 33 bpis sequenced in each biomarker, e.g., each gene, e.g., each driver gene,e.g., each gene disclosed in Table 60 or 61.

E39. The method of any of the preceding embodiments, wherein detectingthe biomarker comprises providing the sequence of the subgenomicinterval or amplicon of no more than 20, 21, 22, 23, 24, 25, 26, 27, 28,29, 30, 31, 32, 33, 34, 35, 40, 45, 50, 55, 60, 100, 200 or 300 bp, inlength and wherein the subgenomic interval or the amplicon comprises thebiomarker, e.g., a driver gene comprising a driver mutation.

E40. The method of any of the preceding embodiments, wherein detectingthe biomarker comprises providing the sequence of the subgenomicinterval or the amplicon of at least 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, or 20 bp, in length and wherein the subgenomic intervalor the amplicon comprises the biomarker, e.g., a driver gene comprisinga driver mutation.

E41. The method of any of the preceding embodiments, wherein detectingthe biomarker comprises providing the sequence of a subgenomic intervalor amplicon of at least 6 and no more than 300 bp, at least 7 and nomore than 200 bp, at least 8 bp and no more than 100 bp, at least 9 bpand no more than 60 bp, at least 10 bp and no more than 55 bp, at least11 bp and no more than 50 bp, at least 12 bp and no more than 45 bp, atleast 13 bp and no more than 40 bp, at least 14 bp and no more than 35bp, at least 15 bp and no more than 34 bp, at least 14 bp and no morethan 33 bp, at least 15 bp and no more than 32 bp, at least 16 bp and nomore than 31 bp, at least 17 bp and no more than 30 bp, at least 18 bpand no more than 29 bp, at least 19 bp and no more than 28 bp, at least20 bp and no more than 27 bp, in length and wherein the subgenomicinterval or amplicon comprises the biomarker, e.g., driver genecomprising a driver mutation.

E42. The method of any of the preceding embodiments, wherein detectingthe biomarker comprises providing the sequence of a subgenomic intervalor amplicon of between 6 bp and 300 bp, 7 bp and 200 bp, or 8 and 100bp, 9 bp and 60 bp, 10 bp and 50 bp, 15 bp and 40 bp, 20 bp and 35 bp inlength and wherein the subgenomic interval or amplicon comprises thebiomarker, e.g., driver gene comprising a driver mutation.

E43. The method of any of the preceding embodiments, wherein detectingthe biomarker comprises providing the sequence of a subgenomic intervalor amplicon of about 33 bp in length and wherein the subgenomic intervalor amplicon comprises the biomarker, e.g., driver gene comprising adriver mutation.

E44. The method of any of the preceding embodiments, further comprising:

b) detecting the level of each of a plurality of, e.g., at least four,protein biomarkers in a biological sample, wherein the level of eachprotein biomarker of the plurality is associated with the presence of acancer of the plurality of cancers;

(optionally) (c) comparing the detected levels of each protein biomarkerof the plurality of protein biomarkers to a reference level for theprotein biomarker; and

d) identifying the presence of a cancer of the plurality of cancers inthe subject when the presence of one or more genetic biomarkers and thelevel of one of the protein biomarkers of the plurality of proteinbiomarkers is detected.

E45. The method of any of the preceding embodiments, wherein:

(i) the subject has not yet been determined to have a cancer, e.g., acancer selected from the plurality of cancers,

(ii) the subject has not yet been determined to harbor a cancer cell,e.g., a cancer cell selected from the plurality of cancers, or

(iii) the subject does not exhibit, or has not exhibited a symptomassociated with a cancer, e.g., a cancer selected from the plurality ofcancers.

E46. The method of any of the preceding embodiments, wherein thesubject:

(i) is a pediatric subject or a young adult; e.g., aged 6 months-21years; or

(ii) is an adult, e.g., aged 18 years or older.

E47. The method of any of the preceding embodiments, wherein the samplecomprises a tumor sample, e.g., a biopsy sample (e.g., a liquid biopsysample (e.g., a circulating tumor DNA sample, or a cell-free DNA sample)or a solid tumor biopsy sample); a blood sample (e.g., a circulatingtumor DNA sample, or a cell-free DNA sample), an apheresis sample, aurine sample, a cyst fluid sample (e.g., a pancreatic cyst fluidsample), a Papanicolaou (Pap) sample, or a fixed tumor sample (e.g., aformalin fixed sample or a paraffin embedded sample (FPPE)).

E48. The method of any of the preceding embodiments, wherein the one ormore, e.g., plurality of, genes comprises 1, 2, 3, or 4 genes fromTables 60 and 61.

E49. The method of any of the preceding embodiments, wherein the one ormore, e.g., plurality of, genes comprises 5, 6, 7, or 8 genes, chosenfrom Tables 60 and 61.

E50. The method of any of the preceding embodiments, wherein the one ormore, e.g., plurality of, genes is a gene selected from: NRAS, CTNNB1,PIK3CA, FBXW7, APC, EGFR, BRAF, CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1,TP53, PPP2R1A, or GNAS.

E51. The method of any of the preceding embodiments, wherein the one ormore, e.g., plurality of, biomarkers (e.g., one or more genes) is chosenfrom KRAS, PIK3CA, HRAS, CDKN2A, TP53, AKT1, CTNNB1, APC, EGFR, GNAS,PPP2R1A, BRAF, FBXM7, PTEN, or FGFR2, or a combination thereof, and thecancer is chosen from: liver cancer, ovarian cancer, esophageal cancer,stomach cancer, pancreatic cancer, colorectal cancer, lung cancer,breast cancer, or prostate cancer.

E52. The method of any of the preceding embodiments, wherein the one ormore, e.g., plurality of, biomarkers (e.g., one or more genes) is chosenfrom KRAS, PIK3CA, HRAS, CDKN2A, TP53, TERT, ERBB2, FGFR3, MET, MLL, orVHL, or a combination thereof, and the cancer is chosen from a bladdercancer or upper tract urothelial carcinoma (UTUC).

E53. The method of any of the preceding embodiments, wherein the one ormore, e.g., plurality of, biomarkers (e.g., one or more genes) is chosenfrom KRAS, PIK3CA, CDKN2A, TP53, CTNNB1, PPP2R1A, BRAF, PTEN, CSMD3,FAT3, BRCA, or ARID1A, or a combination thereof, and the cancer is anovarian cancer or an endometrial cancer.

E54. The method of any of the preceding embodiments, wherein the one ormore, e.g., plurality of, biomarkers (e.g., one or more genes) is chosenfrom KRAS, PIK3CA, CDKN2A, TP53, CTNNB1, GNAS, BRAF, NRAS, VHL, RNF43,or SMAD4, or a combination thereof, and the cancer is a pancreaticcancer, e.g., a pancreatic ductal adenocarcinoma (PDAC).

E55. The method of any of the preceding embodiments, wherein the one ormore, e.g., plurality of biomarkers, comprises 5, 6, 7, or 8 proteinbiomarkers.

E56. The method of any of the preceding embodiments, wherein the one ormore, e.g., plurality of biomarkers, comprises a protein biomarker isselected from: CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, or MPO.

E57. The method of any of the preceding embodiments, wherein detectingthe presence of one or more genetic biomarkers comprises:

a. assigning a unique identifier (UID) to each of a plurality oftemplate molecules present

in the sample;

b. amplifying each uniquely tagged template molecule to createUID-families; and

c. redundantly sequencing the amplification products.

E58. The method of any of the preceding embodiments, further comprisingdetecting the presence of aneuploidy in the sample, e.g., detecting gainor loss in one or more chromosomes, e.g., using the WALDO method asdescribed in Example 6.

E59. The method of embodiment 58, wherein the method comprises: (i)estimating somatic mutation load; (ii) estimating carcinogen signature,and/or (iii) detecting microsatellite instability (MSI).

E60. The method of embodiment 58 or 59, wherein the method can be usedto compare two samples, e.g., two unrelated samples, to evaluate geneticsimilarities between the samples or to find somatic mutations within thesamples, e.g., within the LINE elements in the sample.

E61. The method of embodiment 58 or 59, wherein the method results in anincrease in specificity and/or sensitivity of aneuploidy detection.

E62. The method of embodiment 58 or 59, wherein the presence ofaneuploidy is detected on one or more of, e.g., chromosome arms 4p, 7q,8q, or 9q.

E63. The method of any of the preceding embodiments, further comprisingresponsive to a genetic marker and/or a protein biomarker, assigning anorigin or cancer type to the cancer.

E64. A method of evaluating a subject for the presence of any of aplurality of cancers in a subject, comprising:

(a) detecting in a biological sample obtained from the subject thepresence of one or more driver gene mutations, in each of one or moredriver genes, wherein one or more of the driver genes is chosen fromKRAS, PIK3CA, HRAS, CDKN2A, TP53, AKT1, CTNNB1, APC, EGFR, GNAS,PPP2R1A, BRAF, FBXM7, PTEN, or FGFR2, or a combination thereof, andwherein each driver gene is associated with the presence of a cancer inthe plurality of cancers; and

(b) detecting the level of each of one or more protein biomarkers in abiological sample, wherein the one or more protein biomarkers is chosenfrom CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, or MPO, or acombination thereof, and wherein the level of each protein biomarker isassociated with the presence of a cancer of the plurality of cancers,

thereby evaluating the subject for the presence of any of the pluralityof cancers,

wherein the presence of a cancer of the plurality of cancers isidentified, when the presence of one or more driver gene mutations andthe level of one or more of the protein biomarkers is detected.

E65. The method of E64, wherein the number of driver gene mutationsdetected is sufficient such that the sensitivity of detection of thecancer in the plurality of cancers with which each driver gene isassociated with, is not substantially increased by the detection of oneor more additional driver gene mutations.

Other Embodiments

It is to be understood that while the invention has been described inconjunction with the detailed description thereof, the foregoingdescription is intended to illustrate and not limit the scope of theinvention, which is defined by the scope of the appended claims. Otheraspects, advantages, and modifications are within the scope of thefollowing claims.

EXAMPLES Example 1: Detection and Localization of Surgically ResectableCancers with a Multi-Analyte Liquid Biopsy

Many of the currently approved tests for earlier cancer detection areprocedural in nature, and include colonoscopy, mammography, and cervicalcytology analysis. To date, the vast majority of cancer patientsevaluated with mutation-based liquid biopsies have advanced stagedisease. Yet another issue with liquid biopsies is the identification ofthe underlying organ of origin. Because the same gene mutations drivemultiple tumor types, liquid biopsies based on such alterations cannotgenerally identify the location of the primary tumor giving rise to apositive blood test.

This Example describes a new blood test, called CancerSEEK, whichaddresses the problematic issues described above. The test utilizescombined assays for genetic alterations and protein biomarkers and hasthe capacity not only to identify the presence of relatively earlycancers but also to pinpoint the organ of origin of these cancers (FIG.1).

CancerSEEK is a widely applicable, non-invasive test for most cancers.The eight cancer types studied here account for 360,000 (60%) of theestimated cancer deaths in the U.S. in 2017 and their earlier detectioncould conceivably reduce deaths from these diseases. At the time of thisdisclosure, the cost of CancerSEEK is less than $500, which iscomparable or lower than other screening tests for single cancers, suchas colonoscopy, while this test can detect at least eight differentcancer types.

Materials and Methods Plasma, White Blood Cell and Tumor DNA Samples

The study was approved by the Institutional Review Boards for HumanResearch at each institution, and samples were obtained after informedconsent was obtained. Patients with stage I to III cancer, who hadundergone surgical resection at the participating institutions wereincluded in the study. Blood was collected from patients before anytherapy was undertaken (i.e., before neoadjuvant therapy in thosepatients receiving neoadjuvant therapy) and before surgery in allpatients. If sample was drawn on the day of surgery, then care was takento ensure that the blood was collected prior to the administration ofanesthesia, as anesthesia can increase the levels of circulatingbiomarkers (Cohen et al., 2017 Proc Natl Acad Sci USA 114:10202-10207).General demographics, surgical pathology, and AJCC stage (7^(th)edition) were documented. The ‘healthy’ cohort consisted of peripheralblood samples obtained from 812 individuals of median age 55 (IQRinterquartile range 28 to 65) with no history of cancer. The cancer andhealthy control samples were processed in an identical manner. Plasmasamples from 46 of the 1,005 cancer patients and 181 of the 812 normalsamples had been previously evaluated with a different approach (Cohenet al., 2017 Proc Natl Acad Sci USA 114:10202-10207) (Table 2).

DNA was purified from an average of 7.5 mL plasma using a QIASymphonycirculating DNA kit (cat #1091063), as specified by the manufacturer.DNA from peripheral WBCs was also purified with the QIAsymphony DP DNAMidi Kit (Cat #937255) as specified by the manufacturer. Tumor tissueswere formalin-fixed and paraffin-embedded (FFPE) according to standardhistopathologic procedures and also purified with a QIAsymphony DP DNAMidi Kit (Cat #937255).

Mutation Detection and Analysis

For amplification of DNA from plasma, 61 primer pairs were designed (seebelow) to amplify 66 to 80 bp segments containing regions of interestfrom 16 genes.

The 61 primer pairs were divided into two non-overlapping sets eachcontaining either 28 or 33 primer pairs. Each of these two primer setswere used to amplify DNA in six independent 25 μl reactions as describedelsewhere (see, e.g., Wang et al., 2016 Elife 5) except that 15 cycleswere used for the initial amplification. The PCR products were purifiedwith AMPure XP beads (Beckman Coulter, Pa., USA) and 1% of the purifiedPCR products were then amplified in a second round of PCR as describedelsewhere (see, e.g., Wang et al., 2016 Elife 5), but using 21 cycles.PCR products from the second round of amplification were then purifiedwith AMPure and sequenced on an Illumina MiSeq or HiSeq 4000 instrument.

The template-specific portion of the reads was matched to referencesequences using custom scripts written in SQL and C#. Reads from acommon template molecule were then grouped based on the uniqueidentifier sequences (UIDs) that were incorporated as molecular barcodesas described elsewhere (see, e.g., Kinde et al., 2011 Proc Natl Acad SciUSA 108:9530-9535). Artefactual mutations introduced during the samplepreparation or sequencing steps were reduced by requiring a mutation tobe present in >90% of reads in each UID family. Redundant reads arisingfrom optical duplication were eliminated by requiring reads with thesame UID and sample index to be at least 5,000 pixels apart when locatedon the same tile. Mutations that met one of the two following criteriawere considered (i) present in the COSMIC database (Forbes et al., 2017Nucleic Acids Res 45:D777-D783), or (ii) predicted to be inactivating intumor suppressor genes (nonsense mutations, out-of-frame insertions ordeletions, canonical splice site mutations). Synonymous mutations,except those at exon ends, and intronic mutations, except for those atsplice sites, were excluded.

Evaluation of Plasma Proteins

The Bioplex 200 platform (Biorad, Hercules Calif.) was used to determinethe concentration of multiple target proteins in the plasma samples.Luminex bead based immunoassays (Millipore, Bilerica N.Y.) wereperformed following the manufacturers protocols and concentrations weredetermined using 5 parameter log curve fits (using Bioplex Manager 6.0)with vendor provided standards and quality controls. The HCCBP1MAG-58Kpanel was used to detect FGF2, Osteopontin, sFas, IL-8/CXCL8, Prolactin,HE4, HGF, AFP, CA125, IL6, CA15-3, TGFa, CYFRA21-1, CEA, CA19-9 andLeptin. The HANG2MAG-12K panel was used to detect PAR, sPECAM-1, TSP-2,sEGFR, AXL and sHER2/sEGFR2/sErbB2. The HCMBMAG-22K panel was used todetect DKK1, GDF15, Osteoprotegerin (OPG) and Neuron-specific enolase(NSE). The HCCBP4MAG-58K panels was used to detect Kallikrein-6, CD44,Midkine and Mesothelin. The HAGP1MAG-12K panel was used to detectFollistatin, G-CSF, Angiopoietin-2 and Endoglin. The HCCBP3MAG-58K panelwas used to detect SHBG, Galectin and Myeloperoxidase. The HTMP1MAG-54Kpanel was used to detect TIMP-1 and TIMP-2. LRG-1 and Vitronectin werenot included in this study since they could not be reproduciblyevaluated with a single immunoassay platform.

Algorithm for Classifying ctDNA Status

The classification of a sample's ctDNA status was obtained from astatistical test comparing the normalized mutation frequencies of thesample of interest to the distributions of the normalized mutationfrequencies of, respectively, normal and cancer samples in the trainingset. Specifically, the mutant allele frequency (MAF), defined as theratio between the number of supermutants and the number of UIDs, wasfirst normalized based on the observed MAFs for each mutation in a setof normal controls. Following this mutation-specific normalization, theMAF of each mutation in each well was compared to two referencedistributions of MAFs: 1) a distribution built from the normal controlplasmas in the training set plus a set of 188 WBCs from unrelated,healthy individuals 2) a distribution built from the cancers' samples ina training set that included only mutations found in plasma that werealso present with MAF>5% in the corresponding primary tumors.Corresponding p-values, p^(N) and p^(C), were thus obtained. For eachmutation, the log ratio of these two p-values was calculated, and theminimum and maximum of these log ratios across the six wells wereeliminated so that the results would be less sensitive to outliers. An“omega” score was then determined according to the following formula:

${\Omega = {\overset{4}{\sum\limits_{i = 1}}{w_{i}*\log \frac{p_{i}^{C}}{p_{i}^{N}}}}},$

where w_(i) is the proportion of UIDs in well i out of the total numberof UIDs for that mutation present across the four wells. When a mutationidentified in a plasma sample had Ω>1, and was not identified in theprimary tumor of the patient, DNA from white blood cells (WBCs) of thesame patient whenever WBCs were available (23% of the cancer patientswas evaluated. WBC DNA was tested with the same 61-amplicon panel toensure that the plasma mutation was not a result of Clonal Hematopoiesisof Indeterminate Potential (Jaiswal et al., 2014 N Engl J Med371:2488-2498). WBCs from the normal individuals were evaluatedidentically whenever a mutation with Ω>1 was found in the plasma. Anymutation that was identified in the WBCs as well as in the plasma wasexcluded from the analysis. The requirement for exclusion was that theratio between the max MAF in the plasma and the max MAF in the WBC wasless than 100.

The mutation with the greatest Ω score in each patient or normal controlwas then deemed the “top mutation” and is listed in Table 3. This scorewas used in the logistic regression as well as the concentrations of thefollowing 10 proteins, selected via an optimization step, Prolactin,OPN, IL6, CEA, CA125, HGF, Myeloperoxidase, CA19-9, Midkine, and TIMP-1.To be conservative, a non-linear transformation was applied to thefeatures used by Logistic Regression. Specifically, if a protein'sconcentration in the sample of interest was lower than the 95^(th)percentile of the concentration found for that same protein among thenormal samples in the training set, then the protein's concentration wasset equal to zero; otherwise the log of that concentration was used. Forthe Ω score, the same log-threshold transformation was used but with aconstant threshold equal to 1. The R glmnet package (version 2.10-13)was then used to perform the Logistic Regression, with the lambdaparameter set to zero as described elsewhere (see, e.g., Friedman etal., 2010 Journal of Statistical Software 33:74862). The importance ofeach feature was evaluated by multiplying its coefficient (see below)times the difference of the feature's mean between normal and cancersamples. Ten rounds of 10-fold cross-validations were performed. Theclassification calls obtained in an average round of 10-foldcross-validation (CV) are listed for each of the 812 normal individualsand 1,005 cancer patients in Table 2.

Logistic regression model coefficients and importance scores.

Logistic Regression Importance Feature Coefficient Score Ω score1.77E+00 7.55E+00 CA-125 4.15E−02 1.37E+00 CEA 2.33E−04 1.17E+00 CA19-91.20E−02 5.18E−01 Prolactin 3.51E−05 4.76E−01 HGF 2.45E−03 3.03E−01 OPN1.45E−05 1.72E−01 Myeloperoxidase 5.40E−03 9.31E−02 TIMP-1 7.34E−067.05E−02

For prediction of cancer type, the same 11 features (mutation score andlevels of ten proteins) were used plus patient gender and the other 29proteins evaluated in this study. Cancer type prediction was performedonly on the cancer samples that were correctly classified as cancer byLogistic Regression. Random Forest, as implemented in the randomForestpackage (version 4.6-12) (see, e.g., Liaw et al., 2001 R news 2:18-22)was used for this prediction. Ten rounds of 10-fold CV were performedand, for consistency, in each round and in each fold the same partitionused by Logistic Regression was used by Random Forest. Theclassification calls obtained in an average round of 10-fold CV (thesame round for which cancer status is reported in Table 3), are listedin Table 6.

For determining the concordance between mutations identified in theplasma with those identified in primary tumors (Table 5), only the 155cases in which a mutation could be identified with high confidence inthe plasma were considered (Ω score>3, Table 3) and in which the primarytumor contained any mutation that was present at a mutant allelefraction of >5% (Table 1). This approach allowed us to avoid scoringtumors that had low neoplastic contents.

Sample Identification

To confirm that plasma, WBC, and primary tumor DNA samples originatedfrom the same patient primers were utilized that could be used toamplify 38,000 unique long interspersed nucleotide elements (LINEs) fromthroughout the genome as described elsewhere (see, e.g., Kinde et al.,2012 PloS one 7:e41162). These 38,000 LINEs contain 26,220 commonpolymorphisms which can establish or refute sample identity amongplasma, white blood cell and tumor samples. The genotype at eachpolymorphic location was identified, and the percent concordance betweenthe samples of interest was calculated. Concordance was defined as thenumber of matched polymorphic sites that were identical in both samplesdivided by the total number of genotypes that had adequate coverage inboth samples. Two samples were considered a match if concordancewas >0.99 and at least 5,000 amplicons had adequate coverage.

Statistical Analysis

Continuous variables were reported as means and standard deviations ormedians and range as deemed necessary while categorical variables werereported as whole numbers and percentages. Confidence intervals (CI) forsensitivities were calculated using a binomial distribution. P-valueswere calculated with a one sided binomial test using the R stats package(version 3.3.1).

Results

To identify a panel of protein and gene markers that might be used todetect many solid tumors at a stage prior to the emergence of distantmetastases, a PCR-based assay was designing that could simultaneouslyassess multiple regions of driver genes that are commonly mutated in avariety of cancer types. Four challenges confronted this design. First,the test must query a sufficient number of bases to allow a large numberof cancers to be detected. Second, each queried base must be sequencedthousands of times to detect mutations present at low prevalence. Third,the more bases that are queried the more likely that artifactualmutations will be identified, reducing the signal-to-noise ratio. Andfourth, a test that can be implemented in a screening setting must becost effective and high throughput, limiting the amount of sequencingthat must be performed. To meet these contrasting challenges, a minimumnumber of short amplicons was identified that would allow detection ofat least one driver gene in each of the eight tumor types evaluated.Using Publicly available sequencing data was used to determine thatthere was a fractional power law relationship between the number ofamplicons required and the sensitivity of detection, with a plateau at˜60 amplicons (FIG. 2). Raising the number of amplicons above athreshold level would not detect substantially more cancers but wouldincrease the probability of false positive results. This decreasingmarginal utility defined the optimal number of amplicons.

Based on these data, a 61-amplicon panel was designed with each ampliconquerying an average of 33 bp within one of 16 genes (see Materials andMethods). As shown in FIG. 2, this panel would theoretically detect 41%(liver) to 95% (pancreas) of the cancers in the Catalog of SomaticMutations in Cancer (COSMIC) dataset (Forbes et al., 2017 Nucleic AcidsRes 45:D777-D783). In practice, the panel performed considerably better,detecting at least one mutation in 82%, two mutations in 47%, and morethan two mutations in 8% of the 805 cancers evaluated in our study (dotsin FIG. 2, FIG. 3, and Table 1). A larger fraction of tumors wasdetected than predicted by the COSMIC dataset because the PCR-basedsequencing assay was more sensitive for detecting mutations thanconventional genome-wide sequencing. Based on this analysis of the DNAfrom primary tumor tissues, the predicted maximum detection capabilityof circulating tumor DNA (ctDNA) varied by tumor type, ranging from 60%for liver cancers to 100% for ovarian cancers (FIG. 2).

Armed with this small but robust panel of amplicons, two approaches weredeveloped that enabled the detection of the rare mutations expected tobe present in plasma. First, a multiplex-PCR was used to directly anduniquely label each original template molecule with a DNA barcode. Thisdesign minimized the errors inherent to massively parallel sequencingand made efficient use of the small amount of cell-free DNA present inplasma. Additionally, the total amount of DNA recovered from plasma wasdivided into multiple aliquots and independent assays were performed oneach replicate. This decreased the number of DNA molecules per well;however, it increased the fraction of each mutant molecule per wellmaking it easier to assay. Because the sensitivity of detection is oftenlimited by the fraction of mutant alleles in each replicate, thispartitioning strategy allowed an increase in the signal-to-noise ratioand identification of mutations present at lower prevalence thanpossible if all of the plasma DNA was evaluated at once.

The second component of CancerSEEK is based on protein biomarkers. Theliterature was searched to find proteins potentially useful for earlydetection and cancer diagnosis in at least one of the eight cancer typesdescribed above with sensitivities>10% and specificities>99%. 41potential protein biomarkers were identified and evaluated inpreliminary studies on plasma samples from individuals without cancerand from cancer patients. 39 of these proteins could be reproduciblyevaluated through a single immunoassay platform and these were then usedto assay all plasma samples. Ten of these 39 proteins proved to beuseful for discriminating cancer patients from healthy controls, and areset forth below.

Protein biomarkers analyzed and included in exemplary CancerSEEK test.

Included in Used for Evaluated in exemplary cancer type Protein thisstudy CancerSEEK test identification AFP Yes No Yes Angiopoietin-2 YesNo Yes AXL Yes No Yes CA125 Yes Yes Yes CA15-3 Yes No Yes CA19-9 Yes YesYes CD44 Yes No Yes CEA Yes Yes Yes CYFRA 21-1 Yes No Yes DKK1 Yes NoYes Endoglin Yes No Yes FGF2 Yes No Yes Follistatin Yes No YesGalectin-3 Yes No Yes G-CSF Yes No Yes GDF15 Yes No Yes HE4 Yes No YesHGF Yes Yes Yes IL-6 Yes Yes Yes IL-8 Yes No Yes Kallikrein-6 Yes No YesLeptin Yes No Yes LRG-1 No No No Mesothelin Yes No Yes Midkine Yes YesYes Myeloperoxidase Yes Yes Yes NSE Yes No Yes OPG Yes No Yes OPN YesYes Yes PAR Yes No Yes Prolactin Yes Yes Yes sEGFR Yes No Yes sFas YesNo Yes SHBG Yes No Yes sHER2/sEGFR2/sErbB2 Yes No Yes sPECAM-1 Yes NoYes TGFa Yes No Yes Thrombospondin-2 Yes No Yes TIMP-1 Yes Yes YesTIMP-2 Yes No Yes Vitronectin No No No

This study included 1,005 patients with Stage I to III cancers of theovary, liver, esophagus, pancreas, stomach, colorectum, lung, or breast.No patient received neo-adjuvant chemotherapy prior to blood samplecollection. None had evident distant metastasis at the time of studyentry and all underwent surgical resection with the intent to cure. Themedian age at diagnosis was 64 (range 22 to 93). The eight cancer typeswere chosen because they are common and because no blood-based tests forearlier detection of them are in common clinical use. Thehistopathological and clinical characteristics of the patients aresummarized in Table 2.

The most common stage at presentation was American Joint Commission onCancer (AJCC) stage II, accounting for 49% of patients, with theremaining patients harboring stage I (20%), or stage III (31%) disease.The number of samples per stage for each of the eight tumor types issummarized below. A total of 812 individuals of median age 55 (range 17to 88) with no known history of cancer, high-grade dysplasia, autoimmunedisease, or chronic kidney disease acted as the healthy control cohort.

Cancer patients evaluated in this study by tumor type and stage.

Tumor Type AJCC Stage Patients (n) Proportion of cases (%) Breast I 3215 II 114 55 III 63 30 I-III 209 — Colorectum I 77 20 II 191 49 III 12031 I-III 388 — Esophagus I 5 11 II 29 64 III 11 24 I-III 45 — Liver I 511 II 19 43 III 20 45 I-III 44 — Lung I 46 44 II 27 26 III 31 30 I-III104 — Ovary I 9 17 II 4  7 III 41 76 I-III 54 — Pancreas I 4  4 II 83 89III 6  6 I-III 93 — Stomach I 21 31 II 30 44 III 17 25 I-III 68 —

CancerSEEK evaluates levels of 10 proteins and mutations in 2,001genomic positions; each genomic position could be mutated in severalways (single base substitutions, insertions, or deletions). The presenceof a mutation in an assayed gene or an elevation in the level of any ofthese proteins would classify a patient as positive. Rigorousstatistical methods were employed to ensure the accuracy of the test.Log ratios were used to evaluate mutations and incorporated them into alogistic regression algorithm that took into account both mutation dataand protein biomarker levels to score CancerSEEK test results. The meansensitivities and specificities were determined by ten iterations of10-fold cross-validations. The receiver operating characteristic (ROC)curves for the entire cohort of cancer patients and controls in onerepresentative iteration is shown in FIG. 4A.

The median sensitivity of CancerSEEK among the eight cancer typesevaluated was 70% (p<10⁻⁹⁶ one-sided binomial test) and ranged from 98%in liver cancers to 33% in breast cancers (FIG. 4C). At thissensitivity, the specificity was >99%, i.e., only 6 of the individualswithout known cancers scored positive.

The features of the test that were most important to the algorithm werethe presence of a ctDNA mutation followed by elevations of Prolactin,OPN, IL-6, CEA, CA125, HGF, Myeloperoxidase, CA19-9, Midkine, and TIMP-1protein levels. Waterfall plots for each of the ctDNA and proteinfeatures used in CancerSEEK illustrate their distribution amongindividuals with and without cancer (FIG. 4). The importance ranking ofthe ctDNA and protein features used in CancerSEEK are provided below anda principal component analysis displaying the clustering of individualswith and without cancer is shown in FIG. 5. The complete dataset,including the levels of all proteins studied and the mutationsidentified in the plasma samples, are provided in Table 3 and Table 4.The probabilistic rather than deterministic nature of the approach usedhere to call a sample positive is evident from FIG. 6; each panelrepresents the sensitivity of CancerSEEK when one specific feature wasexcluded from the analysis.

A screening test is advantageously able to detect cancers at earlystages. The median sensitivity of CancerSEEK was 73% for the most commonstage evaluated (Stage II), similar (78%) for Stage III cancers, andlower (43%) for Stage I cancers (FIG. 4B). The sensitivities for theearliest stage cancers (Stage I) were highest for liver (100%) andlowest for esophageal cancer (20%).

The basis of liquid biopsy is that mutant DNA templates in plasma arederived from dying cancer cells and thus serve as exquisitely specificmarkers for neoplasia. To test this expectation, tumor tissue from 155patients in whom ctDNA could be detected at statistically significantlevels and in whom primary tumors were available were evaluated. Themutation in the plasma was identical to a mutation found in the primarytumor of the same individual in 138 (90%) of these 155 cases (Table 5).This concordance between plasma and primary tumor was evident in all 8cancer types, and ranged from 100% in ovarian and pancreatic cancers to82% in stomach cancers.

A major limitation of conventional liquid biopsies is their inability todetermine the cancer type in patients who test positive, thereby posingchallenges for clinical follow-up. In addition to increasing thesensitivity of detection, the combination of protein biomarkers andctDNA helped identify the type of cancer that might exist with apositive CancerSEEK test. Supervised machine learning was used topredict the underlying cancer type in patients with positive CancerSEEKtests. The input algorithm took into account the gender of the patientand the protein and ctDNA biomarker data. One of the main purposes ofsuch predictions is to determine the most appropriate follow up test forcancer diagnosis or monitoring after a positive CancerSEEK test.Patients with esophageal and gastric cancers were grouped together, asthe optimal follow-up for individuals potentially affected with thesetwo cancers would be endoscopy.

An algorithm was used to study the 617 patients scoring positive in theCancerSEEK Test. Without any clinical information about the patients,the source of the cancer was localized to two anatomic sites in a medianof 83% of these patients (FIG. 8, Table 6; p<10⁻⁷⁷ one-sided binomialtest). Furthermore, the source of the positive test was localized to asingle organ in a median of 63% these patients (FIG. 8, Table 6; p<10⁻⁴⁷one-sided binomial test). The accuracy of prediction varied with tumortype and was best for colorectal cancers and lowest for lung cancers asshown below (see, also, FIG. 8).

Confusion matrix of top predictions from cancer type localizationresults.

Breast Colorectum Liver Lung Ovary Pancreas Upper GI Predicted cancerBreast 63%  3%  2% 8% 4% 3% 3% type Colorectum 26%  84%  30% 48%  15% 15%  44%  Liver 0% 1% 44% 2% 0% 0% 4% Lung 4% 2%  0% 39%  2% 1% 4% Ovary3% 0%  0% 2% 79%  0% 0% Pancreas 4% 2%  9% 2% 0% 81%  0% Upper GI 0% 9%14% 0% 0% 0% 46% 

The results described herein demonstrate that a blood test based ongenes and proteins can be used to detect a major fraction (median of70%) of eight major cancer types. In the majority of the samples thatscored positive, the underlying cancer type could be predicted simplyfrom the test results without any prior knowledge of the patient'smedical history or disease status. The specificity of CancerSEEK washigh, with less than 1% of 812 individuals without known cancers scoringpositive.

Example 2: Combination Approach to Liquid Biopsy Cancer Screening Testswith Increased Sensitivity and High Specificity

There is a strong correlation between tumor stage and prognosis in manycancers (see, e.g., Ansari et al., 2017 Br J Surg 104(5):600-607). Veryfew patients with cancers of the lung, colon, esophagus, or stomach whohave distant metastasis at the time of diagnosis survive for more thanfive years (see, e.g., Howlader et al., 2016 SEER Cancer StatisticsReview, 1975-2013, National Cancer Institute. Bethesda, Md.). It istherefore evident that earlier detection of cancers is one key toreducing deaths from these diseases.

Biomarkers in the circulation provide one of the best ways, inprinciple, to detect cancers at an earlier stage. Historically, the typeof biomarkers used to monitor cancers were proteins (see, e.g., Liottaet al., 2003 Clin Adv Hematol Oncol 1(8):460-462). More recently, mutantDNA has been explored as a biomarker as DNA released from the dyingcells can escape into bodily fluids such as urine, stool, and plasma(see, e.g., Haber et al., 2014 Cancer Discov 4(6):650-661; Dawson etal., 2013 N Engl J Med 368(13):1199-1209; Bettegowda et al., 2014Science translational medicine 6(224):224ra224; Kinde et al., 2013Science translational medicine 5(167):167ra164; Wang et al., 2015Science translational medicine 7(293):293ra104; Wang et al., 2015 ProcNatl Acad Sci USA 112(31):9704-9709; Wang et al., 2016 Elife 5; Springeret al., 2015 Gastroenterology 149(6):1501-1510; Forshew et al., 2012Science translational medicine 4(136):136ra168; Vogelstein et al., 1999Proc Natl Acad Sci USA 96(16):9236-9241; and Dressman et al., 2003 ProcNatl Acad Sci USA 100(15):8817-8822). The concept underlying thisapproach, often called “liquid biopsies” is that cancer cells, likenormal self-renewing cells, turn over frequently. However, studies ofcirculating tumor DNA (ctDNA) indicate that while ctDNA is elevatedin >85% of patients with advanced forms of many cancer types, aconsiderably smaller fraction of patients with earlier stages of cancerhave detectable levels of ctDNA in their plasma (see, e.g., Bettegowdaet al., 2014 Science translational medicine 6(224):224ra224; and Wang etal., 2015 Science translational medicine 7(293):293ra104).

This Example describes using a combination approach to cancer screeningtests which increases the sensitivity of detection of resectable orotherwise treatable cancers under conditions that preserve highspecificity. For example, the assays described in this Example combinedetection of mutations in ctDNA with detection of threshold proteinmarkers in plasma.

Materials and Methods Plasma, White Blood Cell and Tumor DNA Samples

DNA was purified from plasma using a QIASymphony circulating DNA kit(Qiagen, cat #1091063). Custom primers containing a unique identifier(UID) and amplicon specific sequences (Table 38) were used to amplifyplasma DNA, and the resulting products were sequenced on an IlluminaMiSeq or HiSeq instrument. Protein biomarker plasma concentrations weredetermined using Luminex bead based immunoassays on the Bioplex 200platform (Biorad, Hercules Calif.). Plasma samples were scored aspositive if the sample contained a KRAS mutation or if the concentrationof CA19-9, CEA, HGF, or OPN was greater than 100 U/mL, 7.5 ng/mL, 0.92ng/mL, or 158 ng/mL, respectively. All samples were obtained followingapproval by the Institutional Review Boards for Human Research at eachinstitution and informed consent.

Samples were obtained following approval by the Institutional ReviewBoards for Human Research at each institution and informed consent.Patients with Stage IA, IB, IIA or IIB (considered resectable) who hadhad peripheral blood collected prior to surgery, had not receivedneoadjuvant therapy, and had undergone surgical resection at theparticipating institutions between April 2011 and May 2016 were includedin the study. General demographics, surgical pathology, and AJCC stage(7th edition) were documented. The ‘healthy’ cohort consisted ofperipheral blood samples obtained from 185 individuals of average age 64with no history of cancer. The pancreatic cancer and healthy controlsamples were collected and processed in an identical manner.

DNA was purified from 3.75 mL plasma using a QIASymphony circulating DNAkit (cat #1091063), as specified by the manufacturer. Tumor tissues wereformalin-fixed and paraffin embedded (FFPE) according to standardhistopathologic procedures and macro-dissected under a microscope toensure a neoplastic cellularity of >30%. DNA was purified with aQIAsymphony DP DNA Midi Kit (Cat #937255) as specified by themanufacturer. DNA concentrations were assessed by fluorescence usingSYBR Green I (Thermo Cat # S7585).

Mutation Detection and Analysis

For amplification of DNA from plasma, primer pairs were designed toamplify 66 to 80 bp segments containing regions of interest from theKRAS and TP53 genes (Table 11 and Table 12). These primers were used toamplify DNA in six independent 25 μl reactions as described elsewhere(see, e.g., Wang et al., 2015 Proc Natl Acad Sci USA 112(31):9704-9709).Reactions were purified with AMPure XP beds (Beckman Coulter, Pa., USA)and eluted in 50 μl of Buffer EB (Qiagen). A fraction (5 μl) of purifiedPCR products were then amplified in a second round of PCR, as describedelsewhere (see, e.g., Wang et al., 2015 Proc Natl Acad Sci USA112(31):9704-9709). The PCR products were purified with AMPure andsequenced on an Illumina MiSeq or HiSeq 4000 instrument.

The template-specific portion of the reads was matched to referencesequences using custom scripts written in SQL and C#. Reads from acommon template molecule were then grouped based on the uniqueidentifier sequences (UIDs) that were incorporated as molecular barcodes(see, e.g., Allen et al., 2017 Ann Surg 265(1):185-191). Artefactualmutations introduced during the sample preparation or sequencing stepswere reduced by requiring a mutation to be present in >90% of reads ineach UID family.

Evaluation of Plasma Proteins

The Bioplex 200 platform (Biorad, Hercules Calif.) was used to determinethe concentration of multiple target proteins in the plasma samples.Luminex bead based immunoassays were performed following themanufacturers protocols and concentrations were determined using 5parameter log curve fits (using Bioplex Manager 6.0) with vendorprovided standards and quality controls. Plasma samples were diluted6-fold for assay of CA19-9, CEA, HGF, OPN and prolactin and 5-fold forassay of midkine. Plasma samples were scored as positive if theconcentration of CA19-9, CEA, HGF, or OPN was greater than 100 U/mL, 7.5ng/mL, 0.92 ng/mL, or 158 ng/mL, respectively. The dynamic ranges ofthese immunoassays for CA19-9, CEA, HGF, OPN, prolactin, and midkinewere 2.74-2,000 U/mL, 78.19-57,000 pg/mL, 27.43-20,000 pg/mL,548.7-400,000 pg/mL, 137.17-100,000 pg/mL, and 13.72-10,000 pg/mL,respectively.

Algorithm for Classifying ctDNA Status

The classification of a sample's ctDNA status was obtained from astatistical test comparing the normalized mutation frequencies of thesample of interest to a distribution of normal controls. Specifically,the MAF, defined as the ratio between the number of supermutants and thenumber of UIDs, was first normalized based on the observed MAFs in a setof normal controls for each mutation. Following this mutation-specificnormalization, the MAF of each mutation in each well was compared to areference distribution of MAFs built from normal controls with allmutations included, and a p-value was calculated from this distribution.The lowest p-value among all mutations detected in a given sample wasdeemed the “top mutation”. The classification of a sample's ctDNA statuswas based on whether the p-value of this top mutation was below or abovea given threshold. The threshold was selected based on a desiredspecificity observed among an independent set of normal controls. Thus,no training was performed on any other sample except these controls; inparticular, neither the 182 healthy controls nor the 221 pancreaticcancer patients described in the main text were included in the controlsused for training the algorithm.

Statistical Analysis

Continuous variables were reported as means and standard deviations ormedians and range as deemed necessary while categorical variables werereported as whole numbers and percentages. Confidence intervals (CI) forsensitivities were calculated using a binomial distribution. Survivalcurves were estimated using Kaplan-Meier method and differences betweencurves were investigated with the log-rank test. Statisticallysignificant variables in the univariate analyses were subjected tomultivariable Cox proportional hazard regression model.

Hazard ratio (HR) and 95% confidence interval (CI) for variablesincluded in the multivariable model were reported. A p-value<0.05 wasconsidered to be statistically significant.

Results

Characteristics of Patients with PDAC and Presumed Healthy Controls

Two hundred and twenty-one patients with surgically resectablepancreatic cancer were evaluated in this study. The histopathologicaland clinical characteristics of these patients are summarized in Table7. A total of 182 individuals of similar age with no known history ofcancer, autoimmune disease, or chronic kidney disease acted as thehealthy control cohort.

Twenty percent of the patients had no symptoms typically associated withpancreatic cancer. The size of the primary tumors at presentation rangedfrom 0.6 cm to 13 cm, with a median size of 3.0 cm. The most commonstage at presentation was American Joint Commission on cancer stage(AJCC) Stage IIB, accounting for 77% of patients, with the remainingpatients harboring Stage IA (5%), Stage IB (8%) or Stage IIA (10%)(Table 7). Patient survival correlated with stage, as graphicallydepicted in FIG. 12 and as expected from prior clinical studies (see,e.g., Allen et al., 2017 Ann Surg 265(1):185-191).

PCR-Based Assay to Identify Tumor-Specific KRAS Mutations in PlasmaSamples

A PCR-based assay was designed that could simultaneously assess the twocodons (codons 12 and 61) of the KRAS gene that are most frequentlymutated in PDAC as well as surrounding codons. The assay employed asensitive technology called the Safe-Sequencing System (Safe-SeqS) (see,e.g., Kinde et al., 2011 Proc Natl Acad Sci USA 108(23):9530-9535).Safe-SeqS incorporates molecular barcodes that uniquely label eachtemplate molecule, thereby drastically minimizing the errors thatroutinely occur in massively parallel sequencing. This approach canidentify one mutant template among as many as 10,000 normal templates.Using this technology, KRAS mutations in the plasma of 66 of the 221(30%: 95% CI 24-36%) pancreatic cancer cases (see below, and Table 8)were identified. Sixty-two (94%) and four (6%) of the mutations were atcodons 12, and 61, respectively, with G>T transversions most commonlyobserved (Table 8). Mutations were found more frequently in Stage IIpatients than in Stage I patients (see below, Table 8, and FIG. 9A).Additionally, while the mutant allele frequency did not correlate withtumor size (Table 8, and FIG. 11A), mutations were found more frequentlyin larger tumors than in smaller tumors (see below, Table 8, and FIG.9B).

Proportion of samples stratified by AJCC stage detected with eachindividual assay and all combinations thereof.

Proportion of samples detected (95% confidence interval) Stage IA StageIB Stage IIA Stage IIB Stage I&II Assay Type (12 cases) (17 cases) (22cases) (170 cases) (221 cases) KRAS ctDNA 25% (5-57%)  0% (0-20%) 18%(5-40%) 35% (28-42%) 30% (24-36%) CA19-9 17% (2-48%) 41% (18-67%) 36%(17-59%) 54% (46-62%) 49% (43-56%) CEA + HGF + OPN 25% (5-57%)  6%(0-29%) 14% (3-35%) 19% (14-26%) 18% (13-24%) KRAS ctDNA + CA19-9 33%(10-65%) 41% (18-67%) 50% (28-72%) 65% (58-72%) 60% (53-67%) KRAS ctDNAMutations + CEA + HGF + OPN 33% (10-65%)  6% (0-29%) 32% (14-55%) 47%(39-55%) 42% (35-48%) CA19-9 + CEA + HGF + OPN 25% (5-57%) 47% (23-72%)36% (17-59%) 59% (52-67%) 54% (47-61%) Combination Assay 33% (10-65%)47% (23-72%) 50% (28-72%) 69% (62-76%) 64% (57-70%)Proportion of samples stratified by tumor size detected with eachindividual assay and all combinations thereof.

Proportion of samples detected (95% confidence interval) ≤1.5 cm 1.5-2.0cm 2.0-2.5 cm 2.5-3.0 cm 3.0-3.5 cm 3.5-4.0 cm >4.0 cm Assay Type (24cases) (12 cases) (47 cases) (38 cases) (36 cases) (22 cases) (42 cases)KRAS ctDNA 21% (7-42%)  17% (2-48%)  9% (2-20%) 32% (18-49%) 42%(26-59%) 45% (24-68%) 43% (28-59%) CA19-9 25% (10-47%) 33% (10-65%) 43%(28-58%) 45% (29-62%) 58% (41-74%) 59% (36-79%) 67% (50-80%) CEA + HGF +OPN 25% (10-47%) 8% (0-38%) 17% (8-31%)   21% (10-37%) 8% (2-22%) 18%(5-40%)  24% (12-39%) KRAS ctDNA + CA19-9 38% (19-59%) 50% (21-79%) 47%(32-62%) 55% (38-71%) 78% (61-90%) 73% (50-89%) 74% (58-86%) KRAS ctDNAMutations + 38% (19-59%) 25% (5-57%)  26% (14-40%) 47% (31-64%) 47%(30-65%) 55% (32-76%) 50% (34-66%) CEA + HGF + OPN CA19-9 + CEA + 38%(19-59%) 33% (10-65%) 47% (32-62%) 53% (36-69%) 64% (46-79%) 64%(41-83%) 67% (50-80%) HGF + OPN Combination Assay 46% (26-67%) 50%(21-79%) 51% (36-66%) 61% (43-76%) 81% (64-92%) 77% (55-92%) 74%(58-86%)Protein biomarkers in various cancer types.

Cancer Type # Cases % CA19-9 % CEA % CA125 % AFP % Prolactin % HGF % OPN% TIMP-1 % Follistatin % G-CSF % CA15-3 Breast 150 3% 4% 1% 1%  8%  3% 3%  0% 1% 3% 1% CRC 322 5% 17%  0% 1% 10% 11%  8%  8% 10%  9% 0%Esophagus 43 7% 5% 0% 0%  2% 33% 19% 26% 2% 14%  5% Gastric 65 11%  15% 0% 5%  3% 34% 20% 11% 8% 8% 3% Liver 53 21%  9% 6% 40%  11% 25% 28% 17%8% 6% 6% Lung 109 4% 13%  0% 1% 11%  1%  2%  0% 0% 0% 3% Ovarian 86 13% 1% 12%  3% 20%  3%  3% 10% 1% 0% 19%  Pancreas 412 52%  8% 0% 0%  0%  7% 6%  7% 8% 1% 0%

The number of mutant templates in the plasma could be calculated fromthe mutant allele fraction and the concentration of DNA in each plasmasample (Table 8). This number was often very low, with 15 (23%) of thepatients with detectable KRAS mutations having <2 mutant templates perml of plasma. The average number of mutant templates per mL of plasmawas 9.4 (Table 8). These results emphasize that extremely sensitivetechniques can be used to detect the mutations in early stage pancreaticcancer patients. KRAS mutations were only observed in one of the 221individuals in the presumed healthy cohort, a 69-year-old male with noknown cancer.

The basis for the liquid biopsy concept is that the mutant DNA templatesidentified in the circulation are derived from cancers. It was thereforeimportant to determine whether the KRAS mutations identified in thesepatients' plasma samples were also present in their primary carcinomas.Primary carcinomas from 50 of the 66 patients with detectable KRASmutations in their plasma were obtained. In all 50 cases, the mutationfound in the plasma was identical to that found in the primarycarcinoma, providing another, orthogonal measure of specificity.

Simultaneous Assessment of CA19-9 and KRAS Mutations in Plasma

It was sought to determine whether a combination of the KRAS ctDNA testwith CA19-9, the PDAC biomarker, would result in improved sensitivitycompared with the KRAS ctDNA test alone. Recent studies have shown thatCA19-9 can be elevated in patients with pancreatic cancer two yearsprior to diagnosis (see, e.g., O'Brien et al., 2015 Clin Cancer Res21(3):622-631). However, CA19-9 elevations have also been observed innon-malignant conditions, and 5% of the population cannot produce theCA19-9 antigen due to germline genetic variation, limiting its use forscreening purposes (see, e.g., Lennon et al., 2010 Diagnostic andTherapeutic Response Markers. Pancreatic Cancer, (Springer New York,N.Y., N.Y.), pp 675-701). However, it was reasoned that CA19-9 mightprove useful as a screening biomarker if the threshold for scoring aresult as positive was sufficiently high. A threshold of 100 U/mL waschosen based on prior data that this level is not found among healthyindividuals who do not have a clinical history of pancreaticobiliarydisease (see, Kim et al., 2004 J Gastroenterol Hepatol 19(2):182-186).

Using this predefined high threshold, CA19-9 was detected in 109 of the221 (49%: 95% CI 43-56%) patients with pancreatic cancer, and in none ofthe 182 healthy controls, confirming its specificity when used in thisway (Table 8, and Table 9). As expected, the number of patients withdetectable CA19-9 levels increased with stage and tumor size (FIG. 9,and Table 8). A question addressed in the current study was whetherthese two biomarkers—KRAS mutations and a positive CA19-9 score—wereindependent indicators of the presence of disease. It was found that theoverlap was only partial, as indicated in the Venn diagram in FIG. 10.Though 42 patients (19%) had elevated CA19-9 levels as well asdetectable KRAS mutations in their plasma, 91 additional patients hadeither mutations in KRAS or elevated CA19-9, but not both (FIG. 10).Thus, the combined sensitivity of these analyses was 60% (95% CI53-67%), higher than the sensitivity of either alone (FIG. 9). As such,this Example demonstrates that the two assays could be combined withoutsubstantially increasing the false positive rate because each wasextremely specific at the thresholds used.

Increasing Sensitivity by Inclusion of Other Protein Biomarkers

Encouraged by the results described above, it was sought to furtherincrease sensitivity by combining ctDNA KRAS mutations and CA19-9 withother protein biomarkers (Table 10). In a pilot study on a small numberof pancreatic cancer samples independent from those studied here, thepotential utility of other proteins that had been found to be elevatedin cancer, including alpha-fetoprotein (AFP), CA15-3, leptin, IL-6,carcinoembryonic antigen (CEA), CA-125, interleukin 8 (IL-8), sFas,prolactin, osteopontin (OPN), basic fibroblast growth factor (FGF2),hepatocyte growth factor (HGF), cytokeratin-19 fragment (CYFRA 21-1),human epididymis protein 4 (HE4), transforming growth factor alpha(TGF-α), growth/differentiation factor 15 (GDF15), dickkopf-relatedprotein 1 (DKK1), neuron specific enolase (NSE), osteoprotegerin (OPG),TIMP metallopeptidase inhibitor 1 (TIMP-1), TIMP metallopeptidaseinhibitor 2 (TIMP-2), mesothelin, midkine, kallikrein-6, CD44, AXLreceptor tyrosine kinase, soluble human epidermal growth factor receptor2 (sHER2), soluble epidermal growth factor receptor (sEGFR), solubleurokinase-type plasminogen activator receptor (suPAR), and solubleplatelet endothelial cell adhesion molecule (sPECAM-1) was evaluated. Ofthese 29 protein biomarkers, five—CEA (see, e.g., Nazli et al., 2000Hepatogastroenterology 47(36):1750-1752), HGF (see, e.g., Di Renzo etal., 1995 Cancer Res 55(5):1129-1138), midkine (see, e.g., Ikematsu etal., 2000 Br J Cancer 83(6):701-706), OPN (see, e.g., Koopmann et al.,2004 Cancer Epidemiol Biomarkers Prev 13(3):487-491), and prolactin(see, e.g., Levina et al., 2009 Cancer Res 69(12):5226-5233)—were chosenfor further analysis.

When the levels of these five markers were evaluated in plasmas from the221 patient pancreatic cancer cohort, an association between the plasmaconcentrations of prolactin and midkine and surgical site (P<0.01, χ2test, degrees of freedom=5) was observed, suggesting that bloodcollection conditions might have elevated the levels of these twomarkers. There was no significant correlation between CA19-9, CEA, HGF,or OPN levels and collection sites, nor was there any correlationbetween the presence of KRAS ctDNA mutations and collection site(P>0.01, χ2 test, degrees of freedom=5). Upon further investigation, itwas noted that the levels of prolactin and midkine were significantlyelevated in samples that were collected after the administration ofanesthesia but before surgical excision (FIG. 14). The results onprolactin were consistent with previous studies showing that anestheticselevate the levels of this protein (see, e.g., Thorpe et al., 2007 PLoSOne 2(12):e1281). To ensure that anesthesia did not affect the levels ofthe other protein biomarkers described above, paired plasma samples werecollected before and immediately after the administration of anesthesiain 29 new patients. The only proteins found to be elevated by anesthesiawere prolactin and midkine (FIG. 15), in perfect accordance with thecorrelation between collection site and protein levels noted above.Prolactin and midkine were therefore excluded from further analysis.

Unlike CA19-9, no predefined threshold exists for the use of CEA, HGF,or OPN as markers for pancreatic cancer. As a result, appropriatethresholds were determined in an independent set of 273 plasma samplesfrom healthy controls. To be conservative, thresholds for each proteinthat were 10% higher than the maximum values observed in any of the 273normal plasma samples were chosen. Notably, when these thresholds wereapplied to the independent test set of 182 plasma samples, all threeprotein markers maintained 100% specificity (Table 9). The sensitivityof each of these three markers was less than that obtained with KRASmutations or CA19-9 when each marker was used alone, but their levelswere less dependent on stage and size than KRAS mutations or CA19-9(FIG. 13, and Table 8). In combination with KRAS mutations and CA19-9assays, this five-member biomarker panel (“combination assay”) detected141 (64%: 95% CI 57-70%) of the 221 resectable cancers (Table 7, FIG.9A, FIG. 10, and Table 8).

Some of the patients detectable by the combination assay were ofparticular note. Forty-five (20%) patients had no symptoms classicallyassociated with pancreatic cancer (Table 8). The combination assayidentified 27 (60%) of these individuals, of whom 19 (70%) had noevidence of recurrence with a median follow up of 12 (range 3-16) months(Table 8). Of the 29 patients with the earliest stages of disease(Stages IA and IB) recognized by the AJCC, 12 (41%) were detectableusing the combination assay (FIG. 9A), of whom 7 (58%) had no evidenceof recurrence at the study termination with a median follow up of 19(range 2-25) months.

Another notable but sobering result from this study was that patientswith poorer survival were more likely to have a positive test. Of theentire 221 pancreatic cancer patients studied, 122 (56%) patients werealive at the termination of the study, with a median follow-up of 13(7-21) months. It was found that the combination assay providedprognostic value that was independent of conventional clinical andhistopathologic features. In particular, multivariate analyses showedthat the independent predictors of overall survival were combinationassay status (HR=1.76, 95% CI, 1.10-2.84, p=0.018), increasing age(HR=1.04, 95% CI 1.02-1.06, p=0.001), grade of differentiation (poorlydifferentiated, HR=1.72, 95% CI 1.11-2.66, p=0.015), lymphovascularinvasion (present, HR=1.81, 95% CI 1.06-3.09, p=0.028), nodal disease(present, HR=2.35, 95% CI 1.20-4.61, p=0.013), and margin status(HR=1.59, 95% CI 1.01-2.55, p=0.050) (Table 7, FIG. 16).

TP53 in a Multiplex Assay

While nearly all pancreatic cancers harbor mutations within KRAS, alarge fraction (˜75%) also contains mutations in TP53 (see, e.g., Joneset al., 2008 Science 321(5897):1801-1806; Biankin et al., 2012 Nature491(7424):399-405; and Waddell et al., 2015 Nature 518(7540):495-501).Furthermore, TP53 is the mostly commonly mutated gene in cancer (see,e.g., Vogelstein et al., 2013 Science 339(6127):1546-1558), making it anattractive target for ctDNA detection in future studies involving othertumor types. To determine whether the mutant allele frequencies of TP53in the plasma correlated with those of KRAS, and also to determinewhether a mutant TP53 assay in plasma might add to the sensitivity ofthe mutant KRAS assay, the 152 carcinomas for which matched tumor andplasma samples were available were evaluated. Mutations at one of the“hotspots” identified in genome-wide studies of PDAC (see, e.g., Joneset al., 2008 Science 321(5897):1801-1806) were first searched for. Atotal of 64 (42%) carcinomas contained a TP53 mutation at one of thesepositions. It was then determined whether these same mutations could beidentified in the plasma of these 64 patients, using Safe-SeqS-basedassays similar to that described above for KRAS but using primersspecific for particular TP53 mutations.

TP53 mutations in 13 (20%) of the 64 plasma samples (see below) wereidentified. Two observations were of interest. First, 12 of the 13plasma samples containing a detectable TP53 mutation also contained adetectable KRAS mutation. Thus, TP53 mutation assays did notsubstantially increase sensitivity for pancreatic cancer detection, asexpected from the high prevalence of KRAS mutations noted above. Second,there was a strong correlation between the mutant allele frequencies ofTP53 and KRAS mutations in the plasma of the 12 patients whose plasmacontained detectable amounts of both mutations (FIG. 3, Pearson'sr=0.885). This provides yet another validation of the reliability of thectDNA assay and its quantitative nature.

List of Exemplary TP53 Mutations Detected in PDAC Patients

Mutant mean Mutant Plasma DNA allele fragments/ concentration frequencymL Sample ID # (ng/mL) Mutation identified in plasma (%) plasma PANC 33513.28 TP53 p.R196*, c.586C > T 0.795 32.5 PANC 336 13.67 TP53 p.G266E,c.797G > A 0.929 39.1 PANC 552 19.44 TP53 p.R175H, c.524G > A 0.673 40.3PANC 387 11.11 TP53 p.R175H, c.524G > A 0.195 6.7 PANC 467 6.76 TP53p.R248Q, c.743G > A 0.344 7.2 PANC 468 10.28 TP53 p.S241F, c.722C > T0.139 4.4 PANC 469 5.98 TP53 p.C238F, c.713G > T 0.051 0.9 PANC 54512.23 TP53 p.Y236C, c.707A > G 0.137 5.2 PANC 547 10.73 TP53 p.Y234C,c.701A > G 0.069 2.3 PANC 552 10.00 TP53 p.C238Y, c.713G > A 0.317 9.8PANC 692 9.83 TP53 p.V272M, c.814G > A 0.101 3.1 PANCA 1105 11.49 TP53p.H193Y, c.577C > T 0.098 3.5 PANCA 1109 27.57 TP53 p.Y234C, c.701A > G0.351 29.8

List of Exemplary CDKN2A Mutations Detected in PDAC Patients

Mutant mean allele frequency Sample ID # Mutation identified in plasmaMutation identified in tumor tissue (%) PANC 335 PLS1 CDKN2A p.R58*,c.172C > T CDKN2A p.R58*, c.172C > T 0.432 PANC 552 PLS 1 CDKN2A p.D84G,c.251A > G CDKN2A p.D84G, c.251A > G 0.166 PANC 641 PLS 1 CDKN2Ag.21971208G > T (splice site) CDKN2A g.21971208G > T (splice site) 0.143PANC 447 PLS CDKN2A p.R80*, c.238C > T CDKN2A p.R80*, c.238C > T 0.102PANC 398 PLS 1 CDKN2A p.V51D, c.152T > A CDKN2A p.V51D, c.152T > A 0.091PANC 609 PLS 1 CDKN2A p.R80*, c.238C > T CDKN2A p.R80*, c.238C > T 0.071PANC 455 PLS CDKN2A p.H83Y, c.247C > T CDKN2A p.H83Y, c.247C > T 0.061PANC 517 PLS 1A CDKN2A p.R58*, c.172C > T CDKN2A p.R58*, c.172C > T0.026 PANC 547 PLS 1 CDKN2A p.R58*, c.172C > T CDKN2A p.R58*, c.172C > T0.026 PANC 648 PLS 1 CDKN2A p.A76T, c.226G > A CDKN2A p.A76T, c.226G > A0.016 PANC 715 PLS 1 CDKN2A p.M54fs, c.162 > G CDKN2A p.M54fs, c.162 > G0.014 PANC 763 PLS 1 CDKN2A p.H83Y, c.247C > T CDKN2A p.H83Y, c.247C > T0.013 PANC 509 PLS 1 CDKN2A p.H83Y, c.247C > T CDKN2A p.H83Y, c.247C > T0.012 PANC 545 PLS 1 CDKN2A p.R80fs, c.239ACCCG> CDKN2A p.R80fs,c.239ACCCG> 0.011 PANC 634 PLS 1 CDKN2A p.A76T, c.226G > A CDKN2Ap.A76T, c.226G > A 0.006

Example 3: Sensitivity and Specificity of ctDNA and Protein BiomarkersMaterials and Methods Phase A

Healthy Cohort:

Ten thousand non-symptomatic women are recruited. The age range ofparticipants is 65 to 75 years, as this range captures patients atmaximum risk for cancers of the eight types that are the target ofdetection. Women with oophorectomies, known cancer of any type otherthan non-melanoma skin cancer, are excluded from the study. Plasma fromeach participant is obtained at study entry. In participants who testpositive, as well as in a random sample of participants with negativetests, one additional sample of plasma is drawn at 3 months followingthe first test. When both tests are positive for the same mutation, awhole body PET/CT scan is performed. If the PET/CT exam is positive,patients are managed as deemed appropriate by their physicians. Thismanagement includes yearly follow-up ctDNA tests. Yearly follow-up ofpatients via electronic questionnaires, combined with phone interviewswhen necessary, are obtained on all individuals, whether their tests arepositive or negative.

Management Cohorts:

Two hundred patients with Stage III colon cancers and 50 patients withresectable pancreatic cancers are recruited from at least 10 sitesacross Australia, New Zealand and Singapore. Tumor samples are tested toidentify mutations that can be used for subsequent ctDNA tests. Plasmasamples are collected on all patients prior to surgery (including thoserandomized to receive routine care). Additional plasma samples arecollected at 3, 6, and 12 months in patients whose ctDNA tests arepositive prior to surgery. No further ctDNA analyses are performed inctDNA-negative patients or those randomized to receive routine care.Patients are randomized 1:1 to ctDNA-informed management (with treatmentescalation or de-escalation compared to standard of care, or to routinecare [blinded to ctDNA test results]).

Phase B

Healthy Cohort:

Forty thousand more non-symptomatic women are recruited. The inclusionand exclusion criteria are the same as in Phase A. Other than the longerfollow-up and larger number of patients, there are at least two otherdifferences between Phase A and Phase B. First, protein biomarkers thatpassed the specificity threshold (>99.5%) in Phase A samples areincluded. Patients who test positive for one or more of these proteinbiomarkers are managed identically to those with positive ctDNA results.Second, a “control” Healthy Cohort is added. These representsindividuals recruited from the same population but in whom no bloodsamples are taken to assess disease or guide management. These controlsare used to assess the ability of the screening tests to detect diseasebefore they would ordinarily be detected on the basis of symptoms orstandard medical care of healthy individuals. Comparisons between stageand survival in the screened and control cohorts are also made.

Management Cohorts:

One thousand more patients with Stage III colon cancers (a total of 1200patients) and 210 more patients with resectable pancreatic cancers (fora total of 250 patients) are recruited, but now at 30 sites rather than10 sites. Similarly, to Phase A, tumor samples are collected at surgeryin all new patients and mutations identified so that they can be used insubsequent ctDNA tests. Additionally, protein biomarkers for colorectalor pancreatic cancers that passed the specificity threshold (>99.5%) inPhase A are included. Patients who test positive for these proteinbiomarkers are managed identically to those with positive ctDNA results.Plasma samples are collected on all patients prior to surgery and at 3,6, and 12 months in patients whose ctDNA tests are positive prior tosurgery. Patients are randomized 1:1 to ctDNA-informed management (withtreatment escalation or de-escalation compared to standard of care, orto routine care [blinded to ctDNA result]).

Evaluations

The Safe-SeqS-based ctDNA test that is used in the Healthy Cohortincorporates 61 amplicons representing the most commonly mutated regionsof cancer driver genes. It is estimated that ˜70% of the eight cancertypes to be targeted have mutations in at least one of the regionscovered by these amplicons. The fraction of plasmas that have mutationsin at least one of these amplicons is less than 70% because not allcancers give rise to ctDNA (see Bettegowda et al., Sci Transl Med. 2014Feb. 19; 6(224):224ra24. doi: 10.1126/scitranslmed.3007094.).

For the Management Cohort, a targeted sequencing assay is first used toidentify at least one mutation in the FFPE-cancer obtained from surgeryor biopsy. This Safe-SeqS-based test employs 128 amplicons and we haveused it to detect at least one pathogenic driver gene mutation in 395 of401 colorectal cancers and in 200 of 200 pancreatic cancers tested. Onespecific amplicon that detects a mutation in each individual patient'stumor is then used to evaluate the plasma from that patient at thespecified time points—in other words, a fully personalized assay.

In addition to the tests on circulating tumor DNA (ctDNA), a panel of˜25 protein biomarkers is assessed using a Luminex platform on a subsetof plasma samples from the Healthy Cohort. The assays for these proteinshave been well-established. If the thresholds for positivity are set toa high level, they provide additional information that can be combinedwith the ctDNA test results to improve sensitivity without compromisingspecificity. Markers that exhibit>99.5% specificity in Phase A areincorporated into Phase B of the study.

Example 4: Detecting Gynecologic Malignancies Using a CombinationApproach Cancer Screening Tests

Diagnosis of gynecologic malignancy (e.g., cervical cancers, ovariancancers, and endometrial cancers) often includes the Papanicolaou (Pap)test, transvaginal ultrasound (TVUS), and/or detection of the CA-25biomarker. However, screening with current diagnostic approaches is notrecommended for the general population, as it leads to “important harms,including major surgical interventions in women who do not have cancer”(see, e.g., Moyer, 2012 Annals of internal medicine 157:900-904). Thus,new diagnostic approaches are urgently needed.

This Example describes a new blood test, called PapSEEK, which addressesthe problematic issues described above. In this test, DNA from sampledfluid (e.g., fluid containing cells sampled from the endocervical canal)can be used in an assay (e.g., a PCR-based, multiplex test) tosimultaneously assess genetic alterations that commonly occur inendometrial or ovarian cancers (FIG. 19). Overall, 1915 samples from1658 individuals were included in the studies described herein,including 656 patients with endometrial or ovarian cancers and 1002healthy controls. The age, race, histopathologic diagnosis, stage andother clinical information for the cancer patients are provided in Table13. The samples tested from these patients are listed in Table 14.

Materials and Methods Patient Samples

All samples for this study were obtained according to protocols approvedby the Institutional Review Boards of the Johns Hopkins MedicalInstitutions (Baltimore, Md.), McGill University (Montreal, QC, Canada),Gothenburg University (Gothenburg, Sweden), BioreclamationIVT(Chestertown, Md.), Memorial Sloan Kettering Cancer Center (New YorkCity, N.Y.), and the Danish Scientific Ethical Committee (Copenhagen,Denmark). Demographic, clinical, and pathologic staging data werecollected for each patient with cancer are listed in Table 13. Theaverage age of 714 women without cancer used for Pap brush analysis was34 (range: 17 to 67 years). The average age of 125 women without cancerused for Tao brush analysis was 29 (range: 18 to 74 years). Allhistopathology was re-reviewed by board-certified pathologists. DNA wasextracted from tumors, Pap smear fluid, and plasma as describedelsewhere (see, e.g., Kinde et al., 2011 Proc Natl Acad Sci USA108:9530-9535; and Bettegowda et al., 2014 Sci Transl Med 6:224ra224).For intrauterine sampling, Tao Brush IUMC Endometrial Sampler (CookMedical Inc., Bloomington, Ind.) was gently inserted to the level of theuterine fundus. The outer sheath was then pulled back and the brush wasrotated 360 degrees clockwise and then counterclockwise. Then the outersheath was pushed again and the device was removed. The sample wasplaced into Thin-Prep buffer, from which DNA was purified using anAllPrep DNA kit (Qiagen, Germany) according to the manufacturer'sinstructions. Purified DNA from all samples was quantified as describedelsewhere (see, e.g., Rago et al., 2007 Cancer research 67:9364-9370).

Healthy controls included patients with normal cytology findings on Papsmears and no history of gynecologic tumors. Ovarian cancer patientswith history of tubal ligation were excluded from the study.

Somatic Mutation Detection and Analysis

DNA from Pap smear fluid, Tao brush samples, or primary tumors wasamplified in three multiplex PCR reactions consisting of 139 primerpairs that were designed to amplify 110 to 142 bp segments, as describedelsewhere (see, e.g., Wang et al., 2016 Elife 5). These segments containregions of interest from the following 18 genes: AKT1, APC, BRAF,CDKN2A, CTNNB1, EGFR, FBXW7, FGFR2, KRAS, MAPK1, NRAS, PIK3CA, PIK3R1,POLE, PPP2R1A, PTEN, RNF43, and TP53. For each sample, three multiplexreactions, each containing non-overlapping amplicons, were performed, asdescribed elsewhere (see, e.g., Wang et al., 2016 Elife 5). Each samplewas assessed in two duplicate wells. DNA from plasma was amplified intwo multiplex PCR reactions consisting of 61 primer pairs that weredesigned to amplify 67 to 81 bp segments. Each sample was assessed insix duplicate wells. These segments contained regions of interest fromthe following genes: AKT1, APC, BRAF, CDKN2A, CTNNB1, EGFR, FBXW7,FGFR2, GNAS, HRAS, KRAS, NRAS, PIK3CA, PPP2R1A, PTEN, and TP53.

Safe-SeqS, an error-reduction technology for detection of low frequencymutations (see, e.g., Kinde et al., 2011 Proc Natl Acad Sci USA108:9530-9535), was used for all sequencing analyses. One primer in eachpair included a unique identifier sequence (UIDs), consisting of 14degenerate bases with an equal chance of being an A, C, T, or G. Highquality sequence reads were selected based on quality scores, which weregenerated by the sequencing instrument to indicate the probability abase was called in error. Reads from a common template molecule werethen grouped based on the UIDs that were incorporated as molecularbarcodes. Artifactual mutations introduced during the sample preparationor sequencing steps were reduced by requiring a mutation to be presentin >90% of reads in each UID family (i.e., to be scored as a“supermutant”) (see, e.g., Kinde et al., 2011 Proc Natl Acad Sci USA108:9530-9535).

Statistical Analysis of Sequencing Data

All Pap brush and Tao brush samples were analyzed using a mutant allelefraction (MAF)-based approach. Mutations that met one of the twofollowing criteria were considered (i) present in the COSMIC database(see, e.g., Forbes et al., 2017 Nucleic Acids Res 45:D777-D783), or (ii)predicted to be inactivating in tumor suppressor genes (nonsensemutations, out-of-frame insertions or deletions, canonical splice sitemutations). Synonymous mutations, except those at exon ends (see, e.g.,Jung et al., 2015 Nat Genet 47:1242-1248), and intronic mutations,except for those at splice sites, were excluded. The MAF in the sampleof interest was first normalized based on the distribution of MAFs forthe same mutation in the control group. Following this mutation-specificnormalization, a p-value was obtained by comparing the MAF of eachmutation in each well with a reference distribution of MAFs built fromnormal controls where all mutations were included. The Stouffer'sZ-score was then calculated from the p-values of two wells, weighted bytheir number of UIDs.

A sample was scored as positive when any of its mutations had a valueabove the corresponding thresholds for any of the following threecriteria: 1) the difference between its MAF and the correspondingmaximum MAF observed for that mutation in the controls, 2) the ratio ofits Stouffer's Z-score to the average of the highest six non-zeroStouffer's Z-scores for the same mutation in the controls, or 3) itsStouffer's Z-score alone when the mutation was not seen in the controls.

Sensitivity and specificity were obtained from a 10-fold crossvalidation. In each round, Pap brush samples from 90% of the 714 womenwithout cancer served as controls. In each of the ten rounds, theremaining 10% of the Pap brush samples from women without cancer werescored to obtain specificity. All other samples were scored once in eachof the 10 rounds for a total of ten times, and were considered to bepositive overall if they scored positive more than half of the time(i.e. 5 or more rounds). The mutations in the samples that scoredpositive are listed in Table 15.

The analysis of the plasma samples was done using an empirical Bayesapproach. A Beta distribution was first fitted based on the MAFs in aset of controls using maximum likelihood estimation. The MAFs of allmutations were then adjusted accordingly. A p-value was calculated foreach mutation in each well by comparison to the distribution of adjustedMAFs among the controls. An overall p-value for every mutation wasobtained as the product of the p-values from all 6 wells. Sensitivityand specificity were obtained from a 10-fold cross validation. In eachround, normal plasma samples from 192 healthy individuals served ascontrols, as for the Pap brush samples described above. A sample wasconsidered to be positive if it was positive in 5 or more rounds. Themutations in the samples that scored positive are listed in Table 17.

Confidence intervals for sensitivities and specificities were calculatedassuming binomial distributions with the actual sensitivities andspecificities set as the corresponding success probabilities.

Aneuploidy Detection and Analysis

For each sample, a single primer pair was used to amplify ˜38,000 lociof long interspersed nucleotide elements (LINEs) throughout the genome(see, e.g., Kinde et al., 2012 PLoS One 7:e41162). Massively parallelsequencing was performed on Illumina instruments. One of the primersinclude an UID to as a molecular barcode as described above to reduceerror rates associated with PCR and sequencing. The sequencing data werethen processed to identify significant single chromosomal arm gains orlosses, as well as allelic imbalance on 39 chromosome arms, usingWithin-Sample AneupLoidy DetectiOn (WALDO) software (see, e.g., Example6). WALDO incorporates a support vector machine (SVM) to discriminatebetween aneuploid and euploid samples. The SVM was trained using 3150synthetic aneuploid samples with low neoplastic content and 677 euploidperipheral white blood cell (WBC) samples. A sample was scored aspositive (aneuploid), if the SVM discriminant score exceeded a giventhreshold, or if significant gains of chromosome arms 7q and 8q wereobserved. These chromosome arms are frequently gained in bothendometrial and ovarian cancers (see, e.g., Cancer Genome AtlasResearch, 2013 Nature 497:67-73; and Cancer Genome Atlas Research, 2011Nature 474:609-615).

Results

Evaluation of Somatic Mutations in Pap Brush Samples from Patients withEndometrial or Ovarian Cancer

The amount of DNA shed from neoplastic cells was expected to be a minorfraction of the total DNA in the Pap brush samples, with most DNAemanating from normal cells. Therefore, a sensitive, PCR-basederror-reduction technology, called Safe-Sequencing System (Safe-SeqS),was used to identify mutations in these samples (see Methods andMaterials). In brief, primers were designed to amplify 139 regions,covering 9,392 distinct nucleotide positions within the 18 genes ofinterest (Table 39). Three multiplex PCR reactions, each containingnon-overlapping amplicons, were then performed on each sample.

This assay was applied to Pap brush samples of 382 women withendometrial cancer, 245 women with ovarian cancer, and 714 women withoutcancer. It was found that 81% of the patients with endometrial cancershad detectable mutations, including 78% of patients with early-stage(stages I and II) disease and 89% of the patients with late-stagedisease (stages III and IV; Table S2). The most commonly mutated geneswere PTEN (64%), TP53 (41%), PIK3CA (31%), PIK3R1 (29%), CTNNB1 (21%),KRAS (18%), FGFR2 (11%), POLE (9%), APC (9%), FBXW7 (8%), RNF43 (7%),and PPP2R1A (5%). The median mutant allele fraction (MAF) was 4.0% (95%confidence interval (CI): 3.5% to 4.5%) (Table 15).

Twenty-nine percent of 245 ovarian cancer patients harbored detectablemutations in their Pap brush samples. These included 28% of patientswith early-stage disease and 30% of patients with late-stage disease(Table 14). The most commonly mutated genes were TP53 (74%). The medianMAF was 0.54% (95% CI: 0.4%% to 0.87%) (Table 15). This assay was alsoapplied to 714 women without cancer and found that 1.3% had a detectablemutation, yielding a specificity of 98.7% (95% CI: 97.6% to 99.4%) (FIG.20).

Tumor tissue was available from 83% and 84% of endometrial and ovariancancer patients who donated Pap brush samples, respectively. Using thesame multiplex assay applied to the Pap brush samples, a driver genemutation was identified in 98% and 82% of the endometrial and ovariancancer tissues, respectively (Table 16). Of the endometrial and ovariancancer patients with a driver mutation identified in their primarytumor, 85% and 29%, respectively, had mutations in their Pap brushsamples. Conversely, of the positive Pap brush samples from patientswith endometrial or ovarian cancers, 93% contained at least one drivergene mutation that was identical to that observed in their primarytumor. The fraction of Pap brush samples with mutations that were alsofound in the primary tumors was higher in endometrial cancer patients(97%) than in ovarian cancer patients (73%).

Evaluation of Aneuploidy in Pap Brush Samples

In addition to somatic mutations, aneuploidy is found in the greatmajority of endometrial and ovarian cancers (see, e.g., Cancer GenomeAtlas Research, 2013 Nature 497:67-73; Cancer Genome Atlas Research,2011 Nature 474:609-615; and Vogelstein et al., 2013 Science339:1546-1558). To assess aneuploidy, a PCR-based was used method toamplify ˜38,000 loci of long interspersed nucleotide elements (LINEs)with a single primer pair. LINEs have spread throughout the genome viaretrotransposition and are found on all 39 non-acrocentric autosomalarms. After sequencing, the data was processed to identify gains orlosses on single chromosome arms.

Aneuploidy was detected in the Pap brush samples of 38% (n=382) ofpatients with endometrial cancer, including 34% and 51% of those withearly- and late-stage disease, respectively (Table 14). Aneuploidy wasalso detected in the Pap brush samples of 11% (n=245) of ovarian cancerpatients, including 15% and 9.3% of those with early- and late-stagedisease, respectively (Table 14). In endometrial and ovarian cancers,the most commonly altered arms were 4p, 7q, 8q, and 9q. In contrast,when the aneuploidy assay was applied to the Pap brush samples of 714women without cancer, only one woman was positive (FIG. 20).

Even if a sample does not contain a genetic alteration in one of the 18genes assessed, it might still be aneuploid and detectable by methodsprovided herein. This conjecture was supported by identification of sixpatients (three with endometrial and three with ovarian cancers) who hadno mutations in their Pap brush samples or primary tumors (whenavailable), but whose Pap brush samples displayed aneuploidy. Thecombined test incorporating the above-described assays for mutationsplus aneuploidy, was dubbed “PapSEEK.” PapSEEK scores a sample aspositive if it either harbors a mutation or an abnormal chromosome armnumber. Eighty-one percent of the Pap brush samples from women withendometrial cancers were PapSEEK-positive, including 78% of patientswith early-stage disease and 92% of patients with late-stage disease(FIG. 21 and FIG. 22). Thirty-three percent of the Pap brush samplesfrom women with ovarian cancers were PapSEEK-positive, including 34% ofpatients with early-stage disease and 33% of patients with late-stagedisease (FIG. 21 and FIG. 22). Only 1.4% of the Pap brush samples from714 women without cancer were PapSEEK-positive, yielding a specificityof 98.6% (95% CI: 97.4% to 99.3%) (FIG. 20).

Evaluation of Tao Brush Samples from Patients with Ovarian orEndometrial Cancers

A more direct, minimally invasive sampling of the intrauterine cavity(rather than the endocervical canal) might increase the sensitivity ofthis approach for detecting gynecologic cancers. To explore thispossibility, intrauterine samples were collected using a Tao brush,which is a flexible, narrow brush covered by a retractable outer sheaththat allows direct sampling of the entire endometrial cavity withoutinjury to the myometrium or contamination from the cervical canal. Ithas been approved by the Food and Drug Administration for endometrialsampling and can be used in an outpatient setting without the need foranesthesia. Advantageous to a potential screening test, it iswell-tolerated by patients.

PapSEEK was applied to Tao brush samples collected from 123 patientswith endometrial cancers, 51 patients with ovarian cancers, and 125women without cancer. Ninety-three percent of the Tao brush samples fromendometrial cancer patients contained genetic alterations detected byPapSEEK, including 90% and 98% of patients with early-, and late-stagedisease, respectively (FIG. 22). The most commonly mutated genes in theTao brush samples were PTEN (63%), TP53 (42%), PIK3CA (36%), PIK3R1(20%), KRAS (17%), CTNNB1 (15%), FGFR2 (15%), RNF43 (11%), PPP2R1A (7%),POLE (7%), and FBXW7 (6%), similar to that observed in the Pap brushsamples. The median MAF was 24.7% (95% CI: 21.3% to 26.9%), considerablyhigher than observed in the Pap brush samples, in which the median MAFwas 4.0% (95% CI: 3.5% to 4.5%; Table S4).

Genetic alterations detectable by PapSEEK were found in 45% (95% CI: 31%to 60%) of the Tao brush samples from 51 women with ovarian cancers,including 47% and 44% of patients with early-, and late-stage,respectively (FIG. 22). The most commonly mutated genes were TP53 (86%),consistent with the data on Pap brush samples. The median MAF was 0.88%(95% CI: 0.61% to 2.8%), which was higher than in the Pap brush samples(median 0.54%; 95% CI: 0.4% to 0.87%; Table 15).

PapSEEK was applied to the Tao brush samples from 125 women withoutcancer. None (0%) of these women tested positive for mutations, yieldinga specificity of 100% (95% CI: 97% to 100%; FIG. 20).

Tao brush and Pap brush samples were available from the same women in145 patients (103 with endometrial and 42 with ovarian cancers). Inendometrial cancers, PapSEEK was positive in 91% of the Tao brushsamples and in 81% of the Pap brush samples (p=0.02, mid-P McNemartest). Similarly, the fraction of ovarian cancer patients with apositive PapSEEK test was higher for Tao brush (45%) than for Pap brush(17%; p=0.002, mid-P McNemar test; Table 13).

Tumor tissue was available from 90% and 88% of patients with endometrialand ovarian cancers who donated Tao brush samples, respectively. PapSEEKidentified driver gene mutations in 97% and 80% of the endometrial andovarian cancer tissues, respectively (Table 16). Of the endometrial andovarian cancer patients with a driver mutation identified in theirprimary tumor, 93% and 42%, respectively, had mutations detectable intheir Tao brush samples. Conversely, of the positive Tao brush samplesfrom patients with endometrial or ovarian cancers, 91% contained atleast one driver gene mutation that was identical to that observed intheir primary tumor. The fraction of Tao brush samples with mutationsthat were also found in the primary tumors was higher in endometrialcancer patients (97%) than in ovarian cancer patients (53%).

Evaluation of ctDNA in Patients with Ovarian Cancers

Ovarian cancers that were inaccessible by Pap or Tao brush sampling dueto anatomical or other factors might be detectable by circulating tumorDNA (ctDNA) in plasma. This was tested in 83 ovarian cancer patients whohad donated both Pap brush and plasma samples. Due to the smaller sizeof degraded ctDNA, primers were designed to amplify short 67 to 81 bpDNA fragments, covering 1,931 distinct nucleotide positions within 16genes of interest. To demonstrate the specificity of this assay, it wasapplied it to plasma samples from 192 healthy individuals; none (0%)tested positive, yielding a specificity of 100% (95% CI: 98% to 100%).

It was found that 43% (95% CI: 33% to 55%) of the plasma from the 83patients with ovarian cancers had detectable ctDNA. The mutationsdetected are listed in Table 17. As expected, the sensitivity for ctDNAin plasma was higher in patients with late-stage tumors than early-stagetumors (56% vs. 35%; FIG. 23). For early-stage disease, the median MAFin the plasma was 0.85%, which was less than the median MAF (5.7%) inthe Pap smears. At least one of the mutations identified in the plasmacould be identified in 88% of the corresponding primary tumor.

In the Pap brush samples from this same cohort of 83 patients, 40% werepositive by the PapSEEK test. The individuals scoring positive in theirPap brush and plasma samples only partially overlapped (FIG. 21). As aresult, 63% (95% CI: 51% to 73%) of patients were positive with at leastone of the two tests. Those who tested positive included 54% of patientswith early-stage disease and 75% with late-stage disease, respectively(Table 13, FIG. 23).

Discussion

As described herein, a multiplex PCR-based test (PapSEEK) was designedand applied to detect genetic alterations in Pap brush or Tao brushsamples. These samples are minimally invasively and convenientlyobtained during routine office visits. The majority of endometrialcancers could be detected with PapSEEK: 93% with Tao Brush and 81% withPap brush. A substantial fraction of ovarian cancers could also bedetected with PapSEEK: 45% with Tao Brush and 33% with Pap brush. Thespecificity of PapSEEK was high, with only 0% and 1.4% of women withoutcancer testing positive with Tao and Pap brush samples, respectively(FIG. 24). It was also demonstrated that assays for ctDNA in plasmacould be used in conjunction with PapSEEK on Pap brush samples,increasing the sensitivity of detecting ovarian cancer to 63%.

It was notable that the sensitivity for detecting early-stage ovariancancers was as high as that for late-stage disease (47% vs. 44% for Tao;34% vs. 33% for Pap). Without wishing to be bound by theory, there areat least two possible explanations for this unexpected but enticingfinding. First, it has been shown that some ovarian cancers originate inthe fallopian tubes, which could facilitate their early detection withPapSEEK when tumor cells are shed into the uterine cavity. Second, inlate-stage tumors, the fallopian tubes are often matted and obliteratedby the disease and thus less likely to serve as a conduit for tumorcells to pass into the uterus or endocervical canal. In this setting,the addition of ctDNA analysis in plasma to Pap or Tao brush samplingmay be particularly beneficial.

A subset of samples tested herein was composed of high-grade,early-stage cancers. Currently available diagnostic modalities have lowsensitivities for these lesions (see, e.g., Fishman et al., 2005 Am JObstet Gynecol 192:1214-1221; Sharma et al., 2012 Ultrasound ObstetGynecol 40:338-344; and Hamilton et al., 2006 British journal of cancer94:642-646). Though the high-grade subtypes comprise only about 10% ofincident endometrial cancers, they account for more than 40% of deathsfrom the disease (see, e.g., Moore et al., 2011 Clin Obstet Gynecol54:278-291). As these high-grade cancers often arise from a backgroundof atrophic endometrium and can metastasize prior to visibleabnormalities on imaging, transvaginal ultrasound has a limited role inscreening and early diagnosis. Thus it was encouraging that PapSEEKdetected 85% (n=34) and 89% (n=9) of high-grade endometrial cancersconfined to the endometrium in the Pap and Tao brush samples,respectively. In the case of ovarian cancers, the tested cohort includedonly a small number of early-stage, high-grade cases, consistent withthe unfortunate fact that these cancers are often diagnosed only atadvanced stages. Nevertheless, the finding that 36% (n=11) were positivewith combined Pap and plasma sample testing, and that 80% (n=5) werepositive in Tao brush samples, is notable.

The study described herein was retrospective. The samples that wereexamined were derived from patients with known cancers, even though asubstantial fraction was from patients with early-stage lesions. In ascreening setting, the cancers would advantageously be at an earlierstage, and the sensitivities for detection would be expected to becloser to the sensitivity for early-stage cancers observed in thepresent study. Moreover, the age ranges of the controls and cases aretypically better matched in a prospective study than in the presentretrospective study. Some of the ovarian cancer patients who hadmutations detectable in their Pap brush or Tao brush samples did nothave the identical mutations in their primary tumors. This was not anissue with endometrial cancers, wherein at least one mutation in thebrush samples was nearly always (97%) found in the corresponding primarytumors. But this phenomenon was observed in ovarian cancer patients,particularly with the Tao brush. At least one mutation identifiable inthe Pap brush could be identified in 73% of the corresponding primaryovarian tumors, while the same was true for only 53% of the Tao brushsamples.

Without wishing to be bound by theory, one possible explanation for thediscordance between the mutations in brush samples and ovarian cancersfrom the same patients is that the assay detects mutations that do notexist in vivo, representing technical artifacts. It is not believed thatthis is likely, however, given that the specificity of the assays was100% and 99% in Tao brush and Pap brush samples, respectively, fromwomen without cancer. Another possible explanation is tumorheterogeneity. Only a small part of the tumors that were analyzed wassampled and sequenced, and the additional mutations found in the Papsmear or intrauterine samples could represent mutations from other partsof the tumor. It is also possible that some mutations were from smallsynchronous endometrial cancers or early, premalignant endometriallesions that were unnoted by the pathologist. A significant proportionof women with ovarian cancer have synchronous endometrial cancer, withrisk factors including Lynch syndrome, polycystic ovarian syndrome,perimenopause, obesity, nulliparity, and unopposed estrogen replacementtherapy (see, e.g., Al Hilli et al., 2012 Gynecologic oncology125:109-113; Walsh et al., 2005 Obstetrics and gynecology 106:693-699;Zaino et al., 2001 Gynecologic oncology 83:355-362; and Song et al.,2014 Int J Gynecol Cancer 24:520-527).

Though tumor heterogeneity, or multiple synchronous tumors are feasibleexplanations that are often used to explain discordances in liquidbiopsy studies, without wishing to be bound by theory, it is possiblethat clonal expansions of non-malignant cells may play a role in thepresent observations. Clonal proliferations that are not consideredneoplastic have been described in the uterine lavage, bone marrow, skin,and other tissues (see, e.g., Steensma et al., 2015 Blood 126:9-16;Coombs et al., 2017 Cell Stem Cell 21(3):374-382; Young et al., 2016 NatCommun 7:12484; Krimmel et al., 2016 Proc Natl Acad Sci USA113:6005-6010; and Nair et al., 2016 PLoS Med 13:e1002206). Ofparticular interest are the clonal proliferations of endometrial cellsthat cause endometriosis, a sometimes debilitating condition thataffects millions of women. It has recently been shown that theselesions, which can occur throughout the abdomen and are derived fromendometrium, are clonal proliferations that can be driven by the samemutations detected in endometrial cancers (see, e.g., Anglesio et al.,2017 N Engl J Med 376:1835-1848). Without wishing to be bound by theory,it is possible that the hormonal and physiologic changes contributing toor resulting from ovarian cancers stimulate or select for such clonalproliferations in the endometrial lining. On one hand, this possibilityargues against the exquisite specificity that is the conceptual basisfor all liquid biopsies. On the other hand, it could actually enhancethe sensitivity of detection of ovarian cancers, without diminishingspecificity, if large clonal proliferations are almost exclusively foundin women with gynecologic malignancies. Clonal proliferations thataccount for >0.03% of the total cells in the endometrial lining aredetectable by methods provided herein.

Example 5: Detecting Urologic Malignancies Using a Combination ApproachCancer Screening Tests

According to the American Cancer Society, 79,030 new cases of bladdercancer (BC) and 18,540 deaths are estimated to occur in the UnitedStates alone in 2017 (see, e.g., Siegel et al., 2017 CA Cancer J Clin67:7-30), with many BC patients suffer with multiple relapses prior toprogression, providing ample lead-time for early detection and treatmentprior to metastasis (see, e.g., Netto, 2013 Adv Anat Pathol 20:175-203).Urine cytology and cystoscopy with transurethral biopsy (TURB) arecurrently the gold standard for diagnosis and follow-up in bladdercancer. While urine cytology has value for the detection of high-gradeneoplasms, it is unable to detect the vast majority of low-grade tumors(see, e.g., Netto et al., 2016 Urol Clin North Am 43:63-76; Lotan etal., 2003 Urology 61:109-18; and Zhang et al., 2016 Cancer Cytopathol124:552-564). This fact, together with the high cost and invasive natureof repeated cystoscopy and TURB procedures, have led to many attempts todevelop novel noninvasive strategies including urine or serum basedgenetic and protein assays for screening and surveillance (see, e.g.,Kawauchi et al., 2009 Hum Pathol 40:1783-1789; Kruger et al., 2003 Int JOncol 23:41-48; Skacel et al., 2003 J Urol 169:2101-2105; Sarosdy etal., 2006 J Urol 176:44-47; Moonen et al., 2007 Eur Urol 51:1275-80;Fradet et al., 1997 Can J Urol 4:400-405; Yafi et al., 2015 Urol Oncol33:66.e25-66.e31; Serizawa et al., 2010 Int J Cancer 129(1):78-87; Kindeet al., 2013 Cancer Res 73:7162-7167; Hurst et al., 2014 Eur Urol65:367-369; Wang et al., 2014 Oncotarget 5:12428-12439; Ralla et al.,2014 Crit Rev Clin Lab Sci 51:200-231; Ellinger et al., 2015 Expert RevMol Diagn 15:505-516; Bansal et al., 2014 Clin Chim Acta 436:97-103;Goodison et al., 2012 PLoS One 7:e47469; and Allory et al., 2014 EurUrol 65:360-366). Currently available U.S. Food and Drug Administration(FDA) approved assays include ImmunoCyt test (Scimedx Corp), nuclearmatrix protein 22 (NMP22) immunoassay test (Matritech), and multitargetFISH (UroVysion) (see, e.g., Kawauchi et al., 2009 Hum Pathol40:1783-1789; Kruger et al., 2003 Int J Oncol 23:41-48; Skacel et al.,2003 J Urol 169:2101-2105; Sarosdy et al., 2006 J Urol 176:44-47; Moonenet al., 2007 Eur Urol 51:1275-80; Fradet et al., 1997 Can J Urol4:400-405; and Yafi et al., 2015 Urol Oncol 33:66.e25-66.e31).Sensitivities between 62% and 69% and specificities between 79% and 89%have been reported for some of these tests; however, due to assayperformance inconsistencies, cost or required technical expertise,integration of such assays into routine clinical practice has not yetoccurred. Further, because urine cytology is relatively insensitive forthe detection of recurrence, cystoscopies are performed as often asevery three months in such patients in the U.S., and the cost ofmanaging these patients is in aggregate higher than the cost of managingany other type of cancer, and amounts to 3 billion dollars annually(see, e.g., Netto et al., 2010 Pathology 42:384-394).

Further, the annual incidence of these upper tract urothelial carcinomas(UTUCs) in Western countries is 1-2 cases per 100,000, but occurs at amuch higher rate in populations exposed to aristolochic acid (AA) (Chenet al., 2012 Proc Natl Acad Sci USA, 109(21):8241-8246; and Grollman etal., 2013 Environ Mol Mutagen, 54(1):1-7; and Lai et al., 2010 J NatlCancer Inst, 102(3):179-186). Nephroureterectomy can be curative forpatients with UTUC when it is detected at an early stage (Li et al.,2008 Eur Urol. 54(5):1127-34). However, these cancers are largely silentuntil the onset of overt clinical symptoms, typically hematuria, and asa result, most patients are diagnosed only at an advanced stage (Roupretet al., 2015 Eur Urol. 68(5):868-79). Diagnostic tests for the detectionof early-stage UTUC are not currently available.

A broadly applicable approach for non-invasive detection of cancer(e.g., an early-stage cancer such as BCs or UTUCs) could be bothmedically and economically important.

This Example describes a new blood test, called UroSEEK, which addressesthe problematic issues described above. In this test, DNA from urinesamples can be used in an assay (e.g., a PCR-based, multiplex test) tosimultaneously assess genetic alterations that commonly occur in BCs orUTUCs. A schematic of the approach used in the bladder cancer study isprovided in FIG. 25, and a schematic of the approach used in the UTUCstudy is provided in FIG. 31.

Materials and Methods Patients and Samples

Urine samples were collected prospectively from patients in fourparticipating institutions including Johns Hopkins Hospital, Baltimore,Md., USA; A.C. Camargo Cancer Center, Sao Paulo, Brazil; OsakaUniversity Hospital, Osaka, Japan; and Hacettepe University Hospital,Ankara, Turkey. The study was approved by the institutional ReviewBoards of Johns Hopkins Hospital and all other participatinginstitutions. Proper material transfer agreements were obtained.Patients with a known history of malignancy other than bladder cancerwere excluded from the study. The study included two cohorts ofpatients.

The Early Detection cohort comprised 570 patients who were referred to aurology clinic in one of the above hospitals because of hematuria orlower urinary tract symptoms (Table 18). The second cohort (322patients) represented patients with prior established diagnosis ofBladder Cancer (BC) who are on surveillance for disease recurrence(Surveillance Cohort). These patients' primary tumors harbored mutationsin at least one of the 11 genes assessed through the multiplex orsingleplex assays. A minimum follow-up of 12 months was from date ofurine collection was required for cases with no evidence of incident orrecurrent tumors in the Early Detection or Surveillance cohorts,respectively. Urine samples were collected prior to any procedures, suchas cystoscopy, performed during the patients' visits. A total of 892urine samples were analyzed in the study, composed of two type ofsamples. The first was residual urinary cells after processing withstandard BD SurePath™ liquid-based cytology protocols (Becton Dickinsonand Company; Franklin Lakes, N.J., USA). To allow for standard-of-care,residual SurePath® fluids were kept refrigerated for 6-8 weeks prior tosubmission for DNA purification to allow for any potential need forrepeat cytology processing of the same sample. The second sample typewas composed of bio-banked fresh urine samples in which 15-25 mL ofvoided urine samples were stored at 4° C. for up to 60 min prior tocentrifugation (10 min at 500 g) and the pellets stored at minus 80° C.prior to DNA purification. Urines from 188 healthy individuals ofaverage age 26 were also obtained and processed identically to thebio-banked fresh urine samples.

Formalin-fixed paraffin-embedded (FFPE) tumor tissue samples fromtrans-urethral resections (TURB) or cystectomies were collected in 413of the 892 cases. When several different tumors from the same patientwere available (because of recurrences), the earliest tumor tissueobtained following the donation of the urine sample was used in theEarly Detection Cohort. In the surveillance cohort, the tumors precedingthe donation of the urine sample was used in 146 of the 322 patients. Inthe other 176 Surveillance cases, the earliest tissue obtained followingthe donation of the urine sample was used. A genitourinary pathologistreviewed all histologic slides to confirm the diagnosis and select arepresentative tumor area with as high tumor cellularity as possible forthat case. Corresponding FFPE blocks were cored with a sterile 16-gaugeneedle. One to three cores were obtained per tumor and placed in 1.5 mLsterile tubes for DNA purification, as described elsewhere (see, e.g.,Kinde et al., 2013 Cancer Res 73:7162-7167). Electronic medical recordswere reviewed to obtain medical history and follow up data in allpatients.

UTUC Cohort Studied

Sequential patients with UTUC scheduled to undergo a radical unilateralnephroureterectomy at National Taiwan University Hospital in 2012-2016were asked to participate in the study. All patients provided informedconsent using the consent form and study design reviewed and approved bythe Institutional Review Boards at National Taiwan University and StonyBrook University. A total of 56 UTUC patients were enrolled in the studyafter excluding four patients with gross hematuria and one patient witha tumor-urine DNA mismatch by identity testing. Urinary cell DNA from188 urine samples donated by healthy individuals in the U.S. of averageage 40, range 19 to 60 years old, was used to assess the specificity ofthe UroSEEK test. White blood cell (WBC) DNA from 94 normal individualsfrom the U.S. was used to evaluate the technical specificity of the PCRanalysis.

Biological Samples—UTUC Cohort

Urine samples were obtained from patients one day prior to surgery.Urinary cells were isolated by centrifugation at 581g for 10 minutes atroom temperature, washed thrice in saline using the same centrifugationconditions, and stored frozen until DNA was isolated using a Qiagen kit#937255 (Germantown, Md.). DNA was purified from fresh-frozen resectedsamples of upper tract tumors and renal cortex by standardphenol-chloroform extraction procedures as described elsewhere (see,e.g., Chen et al., 2012 Proc Natl Acad Sci USA, 109(21):8241-8246; andJelakovic et al., 2012 Kidney Int. 81(6):559-67). One upper urinarytract tumor per patient was analyzed; for cases with tumors at multiplesites, renal pelvic tumors were preferentially selected wheneveravailable. Formalin-fixed, paraffin-embedded tumor samples were stagedand graded by a urologic pathologist, and the presence of one or moreupper tract urothelial carcinomas was confirmed by histopathology foreach enrolled subject. Pertinent clinical and demographic data wereobtained by a chart review of each subject. eGFR was calculated by theMDRD equation (see, e.g., Levey et al., 2006 Ann Intern Med.145(4):247-54) and used to determine CKD stage (see, e.g., Levey et al.,2005 Kidney Int. 67(6):2089-100).

DNA Adduct Analysis

AL-DNA adduct (7-(deoxyadenosin-N6-yl) aristolactam I; dA-AL-I) levelsin 2 μg of DNA from the normal renal cortex of UTUC patients werequantified by ultra-performance liquid chromatography-electrosprayionization/multistage mass spectrometry (UPLC-ESI/MSn) with a linearquadrupole ion trap mass spectrometer (LTQ Velos Pro, Thermo FisherScientific, San Jose, Calif.) as described elsewhere (see, e.g., Yun etal., 2012 Chem Res Toxicol. 2012 25(5):1119-31).

Mutation Analysis

Three separate assays were used to search for abnormalities in urinarycell DNA. First, a multiplex PCR was used to detect mutations in regionsof ten genes commonly mutated in urologic malignancies CDKN2A, ERBB2,FGFR3, HRAS, KRAS, MET, MLL, PIK3CA, TP53, and VHL (see, e.g., Netto,2011 Nat Rev Urol 9:41-51; Mo et al., 2007 J Clin Invest 117:314-325;Sarkis et al., 1993 J Natl Cancer Inst 85:53-59; Lin et al., 2010 UrolOncol 28:597-602; Sarkis et al., 1994 J Urol 152:388-392; Sarkis et al.,1995 J Clin Oncol 13:1384-1390; Wu, 2005 Nat Rev Cancer 5:713-725; andCancer Genome Atlas Research Network, 2014 Nature 507:315-322). Theprimer pairs used for this multiplex PCR were divided in a total ofthree multiplex reactions, each containing non-overlapping amplicons(see below). These primers were used to amplify DNA in 25 μL reactionsas described elsewhere (see, e.g., Kinde et al., 2011 Proc Natl Acad SciUSA 108:9530-9535) except that 15 cycles were used for the initialamplification. Second, the TERT promoter region was evaluated. A singleamplification primer was used to amplify a 73-bp segment containing theregion of the TERT promoter known to harbor mutations in BC (see, e.g.,Kinde et al., 2013 Cancer Res 73:7162-7167). The conditions used toamplify it were the same as used in the multiplex reactions describedabove except that Phusion GC Buffer (Thermo-Fisher) instead of HF bufferwas used and 20 cycles were used for the initial amplification. The TERTpromoter region could not be included in the multiplex PCR because ofthe high GC content of the former. PCR products were purified withAMPure XP beads (Beckman Coulter, Pa., USA) and 0.25% of the purifiedPCR products (multiplex) or 0.0125% of the PCR products (TERTsingleplex) were then amplified in a second round of PCR, as describedelsewhere (see, e.g., Wang et al., 2016 Elife 5:10.7554/eLife.15175).The PCR products from the second round of amplification were thenpurified with AMPure and sequenced on an Illumina instrument. For eachmutation identified, the mutant allele frequency (MAF) was determined bydividing the number of uniquely identified reads with mutations (see,e.g., Kinde et al., 2011 Proc Natl Acad Sci USA 108:9530-9535) by thenumber of total uniquely identified reads. Each DNA sample was assessedin two independent PCRs, for both the TERT promoter and multiplexassays, and samples were scored as positive only if both PCRs showed thesame mutation. The mutant allele frequencies and number of UIDs listedin Table 19, Table 20, Table 22, and Table 23 refer to the average ofthe two independent assays.

To evaluate the statistical significance of putative mutations, DNA fromwhite blood cells of 188 unrelated normal individuals were assessed. Avariant observed in the samples from cancer patient was only scored as amutation if it was observed at a much higher MAF than observed in normalWBCs. Specifically, the classification of a sample's ctDNA status wasbased on two complementary criteria applied to each mutation: 1) thedifference between the average MAF in the sample of interest and thecorresponding maximum MAF observed for that same mutation in a set ofcontrols, and 2) the Stouffer's Z-score obtained by comparing the MAF inthe sample of interest to a distribution of normal controls. Tocalculate the Z-score, the MAF in the sample of interest was firstnormalized based on the mutation-specific distributions of MAFs observedamong all controls. Following this mutation-specific normalization, aP-value was obtained by comparing the MAF of each mutation in each wellwith a reference distribution of MAFs built from normal controls whereall mutations were included. The Stouffer's Z-score was then calculatedfrom the p-values of two wells, weighted by their number of UIDs. Thesample was classified as positive if either the difference or theStouffer's Z-score of its mutations was above the thresholds determinedfrom the normal WBCs. The threshold for the difference parameter wasdefined by the highest MAF observed in any normal WBCs. The thresholdfor the Stouffer's Z-score was chosen to allow one false positive amongthe 188 normal urine samples studied.

Analysis of Aneuploidy.

Aneuploidy was assessed with Fast-SeqS, which uses a single primer pairto amplify ˜38,000 loci scattered throughout the genome (see, e.g.,Kinde et al., 2012 PLoS One 7:e41162). After massively parallelsequencing, gains or losses of each of the 39 chromosome arms covered bythe assay were determined using a bespoke statistical learning method. ASupport vector machine (SVM) was used to discriminate between aneuploidand euploid samples. The SVM was trained using 3150 low neoplastic cellfraction synthetic aneuploid samples and 677 euploid peripheral whiteblood cell (WBC) samples. Samples were scored as positive when thegenome-wide aneuploidy score was >0.7 and there was at least one gain orloss of a chromosome arm.

Identity Checks.

A multiplex reaction containing 26 primers detecting 31 common SNPs onchromosomes 10 and 20 was performed using the amplification conditionsdescribed above for the multiplex PCR. The primers used for thisidentity evaluation are listed in FIG. 34 (Table 42) for bladder cancercohorts and FIG. 33 (Table 41) for UTUC cohorts.

Statistical Analysis

Performance characteristics of urine cytology, UroSEEK and its threecomponents was calculated using MedCalc statistical software(medcalc.org/calc/diagnostic_test.php).

Results

Early Detection Cohort characteristics.

A flow diagram indicating the number of patients evaluated in this studyand the major results is provided in FIG. 26.

A total 570 patients were included in the Early Detection cohort, eachwith one urine sample analyzed. 90% of the patients had hematuria, 3%had lower urinary tract symptoms (LUTS), and 9% had other indicationssuggesting they were at risk for BC. The median age of the participantswas 58 years (range 5 to 89) (Table 18). 70% of the patients were male.175 (31%) of patients developed BC after a median follow-up period of 18months (range 0 to 40 months). For each patient who developed BC, twoother patients were selected who presented with similar symptoms but didnot develop BC during the follow-up period. By design, then, thefraction of cases in this cohort developing BC was higher than thefraction (5%) of patients with similar presentations that would havedeveloped BC in standard clinical practice. The characteristics of thetumors developing in the 570 patients are summarized below.

Demographic, clinical and genetic features of the early detectioncohort. Ten-gene UroSEEK or multiplex TERT Aneuploidy UroSEEK CytologyCytology Gender n % positive positive positive positive Positive*positive* Males without recurrence 172 59% 3 (2%) 10 (6%) 2 (1%) 13 (8%)0 (0%) 13 (8%) Males with recurrence 32 11% 26 (81%) 21 (66%) 19 (59%)29 (91%) 16 (50%) 30 (94%) Females without 81 28% 2 (2%) 2 (2%) 1 (1%) 5(6%) 0 (0%) 5 (6%) recurrence Females with recurrence 9  3% 4 (44%) 4(44%) 3 (33%) 6 (67%) 1 (11%) 6 (67%) Indication Hematuria without 34661% 6 (2%) 15 (4%) 5 (1%) 22 (6%) 0 (0%) 17 (5%) recurrence Hematuriawith recurrence 163 29% 108 (66%) 90 (55%) 76 (47%) 134 (82%) 18 (11%)32 (2%) LUTS without recurrence 11  2% 0 (0%) 2 (18%) 0 (0%) 2 (18%) 0(0%) 2 (18%) LUTS with recurrence 3  1% 2 (67%) 1 (33%) 0 (0%) 2 (67%) 1(33%) 2 (67%) Other without recurrence 38  7% 1 (3%) 0 (0%) 1 (3%) 2(5%) 0 (0%) 2 (5%) Other with recurrence 9  2% 9 (100%) 8 (89%) 5 (56%)9 (100%) 2 (22%) 9 (100%) Detected Tumor Diagnosis PUNLMP 2  1% 0 (0%) 1(50%) 0 (0%) 1 (50%) 0 (0%) 0 (0%) CIS 7  5% 4 (57%) 4 (57%) 1 (14%) 6(86%) 3 (43%) 6 (86%) LGTCC 31 21% 15 (48%) 18 (58%) 9 (29%) 22 (71%) 0(0%) 4 (13%) HGTCC 49 33% 34 (69%) 28 (57%) 26 (53%) 40 (82%) 4 (8%) 11(22%) INTCC 61 41% 48 (79%) 36 (59%) 35 (57%) 57 (93%) 9 (15%) 16 (26%)Cytology diagnosis* Positive 21  6% 16 (76%) 12 (57%) 16 (76%) 20 (95%)N/A N/A Atypical 105 30% 21 (20%) 21 (30%) 12 (11%) 30 (29%) N/A N/ANegative 221 64% 4 (2%) 9 (4%) 1 (0.4%) 12 (5%) N/A N/A *Cytology wasavailable on only a subset of cases. N/A Not Available.

Genetic Analysis in Bladder Cancer Cohorts.

Three separate tests were performed for genetic abnormalities that mightbe found in urinary cells derived from BC (FIG. 26). First, mutationswere evaluated in selected regions of ten genes that have been shown tobe frequently altered in urothelial tumors (Table 19). For this purpose,a specific set of primers were designed that allowed detection ofmutations in as few as 0.03% of urinary cells. The capacity to detectsuch low mutant fractions was a result of the incorporation of molecularbarcodes in each of the primers, thereby substantially reducing theartifacts associated with massively parallel sequencing. Second, TERTpromoter mutations were evaluated. A singleplex PCR was used for thisanalysis because the unusually high GC-content of the TERT promoterprecluded its inclusion in the multiplex PCR design. Third, the extentof aneuploidy was evaluated using a technique in which a single PCR isused to co-amplify ˜38,000 members of a subfamily of long interspersednucleotide element-1 (L1 retrotransposons, also called LINEs). L1retrotransposons, like other human repeats, have spread throughout thegenome via retrotransposition and are found on all 39 non-acrocentricautosomal arms.

The multiplex assay detected mutations in 68% of the 175 urinary cellsamples from the individuals that developed BC during the course of thisstudy (95% CI 61% to 75%) (Table 19). A total of 246 mutations weredetected in 8 of the ten target genes (FIG. 27A and Table 19). The meanmutant allele frequency in the urinary cells with detectable mutationswas 18% and ranged from 0.17% to 99%. The most commonly altered geneswere TP53 (45% of the total mutations) and FGFR3 (20% of the totalmutations; FIG. 27A). At the thresholds used, 1.7% of the 395 patientsin the Early Detection Cohort who did not develop BC during the courseof the study had a detectable mutation in any of the ten genes. At thesame thresholds, none of the 188 urinary cell samples from healthyindividuals had any mutation in any of the ten genes assayed (100%specificity, 95% CI 98% to 100%).

Mutations in the TERT promoter were detected in 57% of the 175 urinarycell samples from the patients that developed cancer during the studyinterval (95% CI 49% to 64%; Table 20). The mean TERT mutant allelefrequency in the urinary cells was 14% and ranged from 0.18% to 78%.Mutations were detected in 3 positions: 98% of the mutations were athg1295228 (79%) and hg 1295250 (19%), which are 66 and 88 bp upstream ofthe transcription start site, respectively. These positions have beenpreviously shown to be involved in the appropriate transcriptionalregulation of TERT. In particular, the mutant alleles recruit theGABPA/B1 transcription factor, resulting in the H3K4me2/3 mark of activechromatin and reversing the epigenetic silencing present in normalcells. 4% of the 395 patients in this cohort who did not develop BCduring the course of the study had a detectable mutation in the TERTpromoter. Only one of the 188 urinary samples from healthy individualsharbored a TERT promoter mutation.

Aneuploidy was detected in 46% (95% CI 39% to 54%) of the 175 urinarycell samples from the patients that developed BC during the course ofthe study (Table 20 and Table 21). The most commonly altered arms were5q, 8q, and 9p. All three of these arms harbor well-known oncogenes andtumor suppressor genes which have been shown to undergo copy numberalterations in many cancers, including BC. 1.5% of the urinary cellsamples from the 395 patients who did not develop BC during the courseof the study exhibited aneuploidy. None of the 188 urinary samples fromhealthy individuals exhibited aneuploidy when assessed with the sametechnology.

Comparison with Primary Tumors

Tumor samples from 102 of the patients enrolled in this cohort wereavailable for comparison and were studied with the same three assaysused to study the urinary cell samples (Table 20). In 91 (89%) of these102 cancers, at least one mutation in the eleven genes studied weremutated (in the 10-gene panel or in the TERT promoter). Moreover, atleast one of the mutations identified in the urine samples from these102 patients was also identified in 83% of the corresponding BC (Table19, and Table 20). Analysis of the BCs also shed light on the basis for“false negatives”, i.e., the reason that 21% of urine samples frompatients who developed BC had no detectable mutations in the 11 genestested. The reason could either have been that the corresponding BC didnot harbor a mutation in these 11 genes or that it did, but the fractionof neoplastic cells in the urine sample was not high enough to allow itsdetection with the assays used. At least one mutation in at least one ofthe 11 genes in 62% of the primary tumors was identified from patientswith false negative urine tests for mutations (Table 22, and Table 23).The results indicate that 38% of the 29 false negative tests formutations were due to the fact that none of the queried mutations werepresent in the tumor and that the other 62% of the false negatives weredue to insufficient amounts of cancer cells in the urine.

UroSEEK: Biomarkers in Combination.

As noted above, the ten-gene multiplex assay, the TERT singleplex assay,and the aneuploidy assays yielded 68%, 57%, and 46% sensitivities,respectively, when used separately (Table 19, Table 20, and Table 21).45 samples without TERT promoter mutations could be detected bymutations in one of the other ten genes (FIG. 28A and Table 19).Conversely, 35 samples without detectable mutations in the multiplexassay could be detected by virtue of TERT promoter mutations (FIG. 28Aand Table 20). Ten of the urinary cell samples without any detectablemutations in the 11 genes could be detected by the assay for aneuploidy(FIG. 28A and Table 21). Thus, when the three assays were used together(test termed “UroSEEK”), and a positive result in either assay wassufficient to score a sample as positive, the sensitivity rose to 83%(95% CI 76% to 88%). Only one of the 188 samples from healthyindividuals was scored positive by UroSEEK (specificity 99.5%, CI 97% to100%). Twenty-six (6.5%) of the 395 patients in this cohort who did notdevelop BC during the course of the study scored positive by the UroSEEKtest (specificity 93%, CI 91% to 96%). On average, UroSEEK positivitypreceded the diagnosis of BC by 2.3 months, and in eight cases by morethan a year (FIG. 29A and Table 18).

UroSEEK Plus Cytology

As both cytology and UroSEEK tests are non-invasive and can be performedon the same urine sample, their performance in combination was assessed.There were 347 patients in the Early Detection cohort in whom cytologywas available (Table 18). Among the 40 patients who developedbiopsy-proven cancer in this cohort, 17 were positive by cytology (43%sensitivity). None of the 299 patients that did not develop cancer werepositive by cytology (100% specificity). UroSEEK was positive in 100% ofthe 17 cancer patients whose urines were positive by cytology and in 95%of the 23 cancer patients whose urines were negative by cytology. Thus,in combination, UroSEEK plus cytology afforded 95% (95% CI 83% to 99%)sensitivity, a 12% increase over UroSEEK and a 52% increase overcytology. Among the 299 patients in the early detection cohort who didnot develop BC during the course of the study, 20 (6.6%) were positiveby UroSEEK or cytology, giving the combination of UroSEEK and cytology aspecificity of 93% (95% CI 90% to 96%).

Surveillance Cohort Characteristics

The strategy for surveillance was different than the one used for earlydetection. Patients in whom a BC was surgically excised for treatmentand diagnosis generally have tumor tissue available, and in most suchtumors, a mutation can be identified. For example, it was found duringthe course of this study that a mutation in at least one of the 11queried genes was present in 95.2% of BCs evaluated. All patientsselected for the surveillance study had biopsy confirmed BC and had aurine sample collected 0-5 years after surgery. A total of 322 patientsthat donated urines and whose BC contained a mutation in at least one ofthe 11 genes analyzed were evaluated. It was determined whether a singleurine sample taken a relatively short time following surgical excisionof the BC could reveal residual disease in these 322 patients, asevidenced by later recurrence. 187 (58%) of the 322 patients developedclinically evident BC after a median follow-up period of 10.7 months(range 0 to 51 months). The histopathologic types and tumor stages ofthese patients are summarized below and detailed Table 24. The medianage of the participants was 62 (range 20 to 93). As expected from thedemographics of BC, 75% of the patients were male.

Demographic, clinical and genetic features of the Surveillance cohort.Ten-gene UroSEEK multiplex TERT Aneuploidy UroSEEK Cytology or CytologyGender n % positive positive positive positive Positive* positive* Maleswithout recurrence 59 30% 3 (5%) 8 (14%) 3 (5%) 10 (17%) 0 (0%) 8 (14%)Males with recurrence 90 45% 45 (50%) 53 (59%) 20 (22%) 59 (66%) 20(22%) 53 (59%) Females without recurrence 17  9% 5 (29%) 3 (18%) 0 (0%)6 (35%) 0 (0%) 6 (35%) Females with recurrence 33 17% 15 (45%) 19 (58%)11 (33%) 33 (100%) 6 (18%) 19 (58%) Original Tumor Diagnosis PUNLMP 12 4% 5 (42%) 2 (17%) 1 (8%) 6 (50%) 0 (0%) 2 (17%) CIS 25  8% 11 (44%) 13(52%) 6 (24%) 14 (56%) 5 (20%) 10 (40%) LGTCC 107 35% 27 (25%) 34 (32%)8 (7%) 41 (38%) 0 (0%) 59 (55%) HGTCC 62 20% 22 (36%) 24 (39%) 10 (16%)30 (49%) 4 (7%) 16 (26%) INTCC 104 34% 39 (38%) 47 (45%) 29 (28%) 54(52%) 20 (19%) 34 (33%) Original Tumor Stage pTis 25  8% 11 (44%) 13(52%) 6 (24%) 14 (56%) 5 (20%) 10 (40%) pTa 181 58% 54 (30%) 60 (33%) 19(19%) 77 (43%) 4 (2%) 77 (43%) pT1 71 23% 28 (39%) 35 (49%) 22 (31%) 39(55%) 14 (20%) 23 (32%) pT2 23  7% 9 (9%) 9 (39%) 7 (30%) 12 (52%) 5(22%) 10 (43%) pT3 9  3% 1 (11%) 2 (22%) 0 2 (22%) 1 (11%) 1 (11%) pT4 10.3%  1 (100%) 1 (100%) 0 1 (100%) N/A N/A Routine cytology diagnosis*Positive 30 15% 21 (21%) 25 (83%) 20 (67%) 27 (90%) N/A N/A Atypical 9548% 38 (40%) 43 (45%) 18 (19%) 50 (53%) N/A N/A Negative 71 36% 12 (17%)13 (18%) 3 (4%) 19 (27%) N/A N/A *Cytology was available on only asubset of cases. N/A Not Available.

Genetic Analysis of Surveillance Cohort

The multiplex assay in urinary cells detected mutations in 49% of theurinary cell samples from patients that developed recurrent BC duringthe study interval (95% CI 45% to 60%; Table 24 and Table 25). The meanmutant allele frequency in the urinary cells with detectable mutationswas 16% and ranged from 0.08% to 93%. The most commonly altered geneswere FGFR3 (43% of the 134 mutations) and TP53 (30% of the 134mutations; FIG. 27B). Seven percent of the 135 patients who did notdevelop recurrent BC during the course of the study had a detectablemutation in their urinary cell sample (these are considered to be falsepositives; see Discussion). The mean interval between a positivemultiplex assay test and the diagnosis of recurrent BC was 7 months(range 0 to 51 months).

Mutations in the TERT promoter were detected in 51% of the urinary cellsamples from patients that developed recurrent BC during the studyinterval (95% CI 44% to 58%; Table 26). The mean TERT mutant allelefrequency in the urinary cells with detectable mutations was 6% andranged from 0.23% to 43%. Mutations were detected in the same threepositions observed in the urinary cells of the Early Detection cohort.10% (95% CI 83% to 94%) of the 135 patients who did not developrecurrent BC during the course of the study had a detectable TERTpromoter mutation in their urine sample (false positives). The meaninterval between a positive TERT test and the diagnosis of recurrent BCwas 7 months (range 0 to 40 months).

Aneuploidy was detected in 30% (95% CI 24% to 37%) of the urinary cellsamples from the patients that developed recurrent BC during the courseof the study (Table 27). The most commonly altered arms were 8p, 8q, and9p, as in the Early Detection cohort. Two percent of the 135 patientswho did not develop recurrent BC during the course of the studyexhibited aneuploidy in at least one of their urinary cell samples.

Markers in Combination—Surveillance Cohort

As noted above, the ten-gene multiplex assay, the TERT singleplex assay,and the aneuploidy assays yielded 49%, 51%, and 30% sensitivities,respectively, when used separately (Table 25, Table 26, and Table 27).Thirty-two samples without TERT promoter mutations could be detected bymutations in one of the other ten genes (FIG. 28B and Table 25).Conversely, 41 samples without detectable mutations in the multiplexassay could be detected by virtue of TERT promoter mutations. Three ofthe urinary cell samples without any detectable mutations could bedetected by the assay for aneuploidy. Thus, the sensitivity of UroSEEKwas 66% (95% CI 59% to 73%). Fourteen percent of the 135 patients inthis cohort who did not develop BC during the course of the study scoredpositive by the UroSEEK test, yielding a specificity of 86% (95% CI 77%to 91%). On average, UroSEEK positivity preceded the diagnosis of BC by7 months, and in 47 cases by more than one year (FIG. 29B and Table 24).

There were 196 patients in the Surveillance cohort for whom cytology wasavailable (Table 24). Among the 120 patients who developed recurrent BCin this cohort, 30 (25%) were positive by cytology. Conversely, nopositive cytology results were observed in patients whose tumors did notrecur. UroSEEK was positive in 90% of the recurrent BC patients whoseurines were positive by cytology and in 61% of the 90 recurrent BCpatients whose urines were negative by cytology. Thus in combination,UroSEEK plus cytology afforded 71% sensitivity (95% CI 61.84% to 78.77)(FIG. 28D and Table 22). Among the 76 patients who did not developrecurrent BC during the course of the study and in whom cytology wasavailable, 18% scored as positive by either cytology or UroSEEK,affording a specificity of 82% (95% CI 71% to 90%;).

Low Vs. High Grade Urothelial Neoplasms in Both Early Detection andSurveillance Cohorts

The advantage of UroSEEK over cytology was particularly evident inlow-grade tumors (Papillary urothelial neoplasms of low malignantpotential and non-invasive low grade papillary urothelial carcinomas).There were a total of 49 low-grade tumors evaluated in this study inwhom cytology was available (six from the Early Detection cohort and 43from the Surveillance cohort). None of these low-grade tumors weredetected by cytology (0% sensitivity; 95% CI 0.0% to 6.7%). In contrast,UroSEEK detected 67% (95% CI 51% to 81%) of the low-grade tumors(identical rate of 67% in both cohorts; FIG. 30). Analogously, therewere a total of 102 high-grade tumors (in-situ urothelial carcinoma,non-invasive high grade papillary urothelial carcinoma or infiltratinghigh grade urothelial carcinoma) evaluated in this study in whomcytology was available (34 from the Early Detection cohort and 68 in theSurveillance cohort). Cytology was positive in 45% of these patients(50% and 41% in the Early Detection and Surveillance cohorts,respectively) while UroSEEK was positive in 80% of them (100% and 71% inthe Early Detection and Surveillance cohorts, respectively; see below.

Summary of the performance of Cytology vs. UroSEEK. Cytology UroSEEKBiopsy Outcome Positive Negative 95% Positive Negative 95% Diagnosis ntest test Sensitivity CI test test Sensitivity CI PUNLMP/ 49 0 49  0%0.00% to 33 16 67% 54.36% to LGTCC 6.06% 79.38% CIS/HGTCC/ 102 46 56 45%33.63% to 82 20 80% 71.03% to INTCC 52.21% 86.39% Total 151

UTUC Cohort Characteristics

Thirty-two females and twenty-four males ranging in age from 39-85 yearsparticipated in the study (see below; individual data are in Table 28).This gender distribution, atypical of UTUC patients in Western countrieswhere males predominate (Shariat et al., 2011 World J Urol.29(4):481-6), is consistent with previous epidemiologic studies ofTaiwanese individuals with known exposures to AA (see, e.g., Chen etal., 2012 Proc Natl Acad Sci USA, 109(21):8241-8246). Tobacco use wasreported by 18% of this cohort, all males. Based on estimated glomerularfiltration rate (eGFR) values, renal function was unimpaired (chronickidney disease (CKD) stage 0-2) in 45% of the subjects, whilemild-to-moderate renal disease (CKD stage 3) or severe disease (CKDstages 4-5) was noted for 43% and 12% of the cohort, respectively.

Demographic, clinical and genetic features of the UTUC cohort stratifiedby UroSEEK results. Ten-gene multiplex TERT Aneuploidy UroSEEK n %positive positive positive positive All subjects 56 100%  64% 29% 39%75% Gender Males 24 43% 71% 33% 54% 83% Females 32 57% 59% 25% 28% 69%CKD stage 0-2 25 45% 68% 36% 44% 76% 3A 14 25% 50% 21% 43% 71% 3B 10 18%80% 20% 40% 80% 4 4  7% 25% 50%  0% 50% 5 3  5% 100%   0% 33% 100% Tumor grade Low 6 11% 67% 50% 17% 67% High 50 89% 64% 26% 42% 76% Tumorstage Ta 11 20% 73% 55% 45% 82% Ti 8 14% 50%  0% 38% 75% T2 10 18% 80%20% 10% 80% T3 24 43% 67% 33% 54% 79% T4 3  5%  0%  0%  0%  0% Upperurinary tract tumor site Lower ureter 17 30% 76% 18% 35% 76% Upperureter 1  2% 100%   0%  0% 100%  Ureterovesical junction 2  4%  0%  0% 0%  0% Lower ureter and upper ureter 2  4% 100%  50% 50% 100%  Renalpelvis 21 38% 57% 38% 38% 76% Renal pelvis and lower ureter 4  7% 75%25% 50% 100%  Renal pelvis and upper ureter 5  9% 40% 40% 60% 60% Renalpelvis, lower ureter, 4  7% 75% 25% 50% 75% upper ureter Synchronousbladder cancer Present 21 38% 52% 29% 33% 62% Absent 35 63% 71% 29% 43%83% UTUC risk factors Aristolactam-DNA adducts 54 96% 65% 30% 39% 74%present Smoking history 10 18% 70% 30% 60% 70% CKD, chronic kidneydisease.

Tumors were confined to a single site along the upper urinary tract inthe majority of cases (38% renal pelvis; 39% ureter), while multifocaltumors affecting both renal pelvis and ureter occurred in 23% of thepatients. Synchronous bladder cancer (diagnosed within 3 months prior tonephroureterectomy) was present in 38%. Histologically, 89% of thetumors were classified as high grade, with the majority categorized asmuscle-invasive (T2-T4, 66%).

Mutational Analysis—UTUC Cohort

Three separate tests were performed for genetic abnormalities that mightbe found in urinary cells derived from UTUCs (FIG. 32, Table 29, Table30, Table 31, and FIG. 33). First, mutations were evaluated in selectedexomic regions of ten genes (CDKN2A, ERBB2, FGFR3, HRAS, KRAS, MET, MLL,PIK3CA, TP53, and VHL) that are frequently altered in urologic tumors(Sfakianos et al., 2015). For this purpose, a specific set of multiplexprimers were designed that allowed detection of mutations in as few as0.03% of urinary cells (Table 40). The capacity to detect such lowmutant fractions was a result of the incorporation of molecular barcodesin each of the primers, thereby substantially reducing the artifactsassociated with massively parallel sequencing. Second, TERT promotermutations were evaluated, based on prior evidence that TERT promotermutations are often found in UTUCs. A singleplex PCR was used for thisanalysis because the unusually high GC-content of the TERT promoterprecluded its inclusion in the multiplex PCR design. Third, the extentof aneuploidy was evaluated using a technique in which a single PCR isused to co-amplify 38,000 members of a subfamily of long interspersednucleotide element-1 (L1 retrotransposons). L1 retrotransposons, likeother human repeats, have spread throughout the genome viaretrotransposition and are found on all 39 non-acrocentric autosomalarms.

The multiplex assay detected mutations in 36 of the 56 urinary cellsamples from UTUC patients (64%, 95% CI 51% to 76% (Table 29). A totalof 57 mutations were detected in nine of the ten target genes (FIG. 34).The median mutant allele frequency (MAF) in the urinary cells was 5.6%and ranged from 0.3% to 80%. The most commonly altered genes were TP53(58% of the 57 mutations) and FGFR3 (16% of the 57 mutations) (Table18). None of the 188 urinary cell samples from healthy individuals had adetectable mutation in any of the ten genes assayed (100% specificity,CI 97.5% to 100%).

Mutations in the TERT promoter were detected in 16 of the 56 urinarycell samples from UTUC patients (29%, 95% CI 18% to 42%) (Table 30). Themedian TERT MAF in the urinary cells was 2.22% and ranged from 0.59% to46.3%. One of the 188 urinary samples from healthy individuals harboreda mutation (TERT g.1295250C>T with a MAF of 0.39%). In the UTUC urinarycell samples, mutations were detected in three positions: 94% of themutations were at hg1295228 (67%) and hg1295250 (28%), which are 69 and91 bp upstream of the transcription start site, respectively. Thesepositions have been previously shown to be involved in the appropriatetranscriptional regulation of TERT. In particular, the mutant allelesrecruit the GABPA/B1 transcription factor, resulting in the H3K4me2/3mark of active chromatin and reversing the epigenetic silencing presentin normal cells.

Aristolochic Acid Exposure in the UTUC Cohort

The activated metabolites of aristolochic acid bind covalently to theexocyclic amino groups in purine bases, with a preference for dA,leading to characteristic A>T transversions. To determine whether theindividuals in the cohort had been exposed to AA, renal cortical DNAadducts were qualified using mass spectrometry. All but two of the 56patients had detectable aristolactam (AL)-DNA adducts with levelsranging from 0.4 to 68 dA-AL adducts per 108 nucleotides. Moreover, theA>T signature mutation associated with AA was highly represented in themutational spectra of TP53 (18/32 A>T) and HRAS (2/2 A>T) found inurinary cells (Table 30).

Aneuploidy Analysis in the UTUC Cohort

Aneuploidy was detected in 22 of the 56 urinary cell samples from UTUCpatients (39%, 95% CI 28% to 52%, Table 31, and FIG. 33) but in none ofthe 188 urinary cell samples from healthy individuals. The most commonlyaltered arms were 1q, 7q, 8q, 17p, and 18q. Some of these arms harborwell-known tumor oncogenes or suppressor genes that have been shown toundergo changes in copy numbers in many cancers (Vogelstein et al.,2013).

Comparison with Primary Tumors—the UTUC Cohort

Tumor samples from all 56 patients enrolled in this study were availablefor comparison and were studied with the same three assays used toanalyze the urinary cell samples. This comparison served two purposes.First, it allowed determining if the mutations identified in the urinarycells were derived from the available tumor specimen from the samepatient. There were a total of 39 UTUC cases in which a mutation couldbe identified in the urinary cells. In 35 (90%) of these 39 cases, atleast one of the mutations identified in the urine sample (Table 29 andTable 30) was also identified in the corresponding tumor DNA sample(Table 32 and Table 33). When all 80 mutations identified in the urinarycells were considered, 63 (79%) were identified in the correspondingtumor sample (Table 32 and Table 33). In any of the three assays, thediscrepancies between urine and tumor samples might be explained by thefact that only one tumor per patient was accessible, even though morethan one anatomically distinct tumor was often evident clinically.Additionally, DNA was extracted from only one piece of tissue from eachtumor, and intratumoral heterogeneity could have been responsible forsome of the discrepancies.

The tumor data helped determine why 17 of the 56 urinary cell samplesfrom UTUC patients did not contain detectable mutations. The reasoncould either have been that the primary tumors did not harbor a mutationpresent in the gene panel or that the primary tumor did contain such amutation but the fraction of neoplastic cells in the urine sample wasnot high enough to allow its detection. From the evaluation of theprimary tumor samples, it was found that four (24%) of the 17 urinesamples without detectable mutations were from patients whose tumors didnot contain any of the queried mutations (Table 32). The conclusion wasthat the main reason for failure of the mutation test was aninsufficient number of cancer cells in the urine, and this accounted for13 (76%) of the 17 failures.

There were 22 cases in which aneuploidy was observed in the urinary cellsamples. Overall, 96% of the chromosome gains or losses observed in theurinary cells were also observed in the primary tumors (examples in FIG.35). Conversely, there were 34 cases in which aneuploidy was notobserved in the urinary cell samples. Evaluation of the 56 tumors withthe same assay showed that all but three were aneuploid, so as withmutations, the main reason for failure of the aneuploidy assay wasinsufficient amounts of neoplastic DNA in the urinary cells.

Biomarkers in Combination—the UTUC Cohort

There are two factors that can limit sensitivity for genetically-basedbiomarkers. First, a sample can only be scored as positive for thebiomarker if it contains DNA from a sufficient number of neoplasticcells to be detected by the assay. Second, the tumor from which theneoplastic cells were derived must harbor the genetic alteration that isqueried. Combination assays can increase sensitivity by assessing moregenetic alterations, and are thereby more likely to detect at least onegenetic alteration present in the tumor. However, mutations in clinicalsamples often are present at low allele frequencies (Table 29 and Table30), requiring high coverage of every base queried. It would beprohibitively expensive to perform whole exome sequencing at 10,000×coverage. In this study, the selected regions of 11 genes (includingTERT) were carefully evaluated together with copy number analysis of 39chromosome arms. Even if a tumor does not contain a genetic alterationin one of the 11 genes assessed, it might still be aneuploid anddetectable by the urinary cell assay for aneuploidy. The sensitivity ofaneuploidy detection is less than that of the mutation assays.Simulations showed that DNA containing a minimum of 1% neoplastic cellsis required for reliable aneuploidy detection, while mutations presentin as few as 0.03% of the DNA templates can be detected by the mutationassays used in this study. Nevertheless, urinary cell samples that hadrelatively high fractions of neoplastic cells but did not contain adetectable mutation in the 11 queried genes should still be detectableby virtue of their aneuploidy because, as noted above, 53/56 UTUCsstudied here were aneuploid. Additionally, some of the mutations in the11 genes queried, such as large insertions or deletions or complexchanges, might be undetectable by mutation-based assays but a samplewith such an undetectable mutation could still score positive in a testfor aneuploidy.

To determine whether these theoretical arguments made a difference inpractice, biomarker performance was evaluated with the combinedapproaches, collectively called UroSEEK. As noted above, the ten-genemultiplex assay, the TERT singleplex assay, and the aneuploidy assaysyielded 64, 29%, and 39% sensitivities, respectively, when usedseparately. Twenty-three samples without TERT promoter mutations testedpositive for mutations in one of the other ten genes (Venn diagram inFIG. 32). Conversely, three samples without detectable mutations withthe multiplex assay scored positive for TERT promoter mutations (FIG.32). And, three of the urinary cell samples without any detectablemutations were positive for aneuploidy (FIG. 32). Thus, when the threeassays were used together, and a positive result in any one assay wassufficient to score a sample as positive, the sensitivity rose to 75%(95% CI 62.2% to 84.6%). Only one of the 188 samples from healthyindividuals scored positive in the UroSEEK test (specificity 99.5%, CI97.5 to 100%).

To determine the basis for the increased sensitivity afforded by thecombination assays, data from the primary tumors of the three patientswhose urinary cell samples exhibited aneuploidy but did not harbordetectable mutations were evaluated. It was found that these threetumors did not contain any mutations in the 11 queried genes, explainingwhy these same assays were negative when applied to urinary cell DNA. Asnoted above, these three tumors were aneuploid, thus affording theopportunity to detect these copy number variations in the urinary cellsamples.

Correlation with Clinical Features

A cancer biomarker should advantageously be able to detect tumors at anearly stage, enabling surgical removal of the lesions prior towidespread metastasis. UroSEEK was sensitive in detecting both early andlate tumors. It scored positive in 15 (79%) of 19 patients with stage Taor T1 tumors and in 27 (73%) of 37 patients with stage T2-T4 tumors.Ten-year cancer specific survival rates show that 91% of UTUC patientswith stage T1 malignancies are expected to be cured by surgery, comparedto only 78%, 34% and 0% of patients with stage 2, 3, or 4 tumors,respectively.

UroSEEK sensitivity was independent of a variety of clinical parametersother than tumor stage, including gender, CKD stage, tumor grade, tumorlocation and risk factors for developing UTUC, indicating that the assayis suitable for evaluation of diverse patient populations. Furthermore,UroSEEK was considerably more sensitive than urine cytology in thiscohort. Cytology was available in 42 cases, and of these only four(9.5%) were diagnosed as carcinoma cytologically. Even if samples scoredas “suspicious for malignancy” by cytology were considered as positive,the sensitivity was only 26% (including the four scored as positive andseven scored as suspicious). UroSEEK detected all four cases scored aspositive by cytology, five of the seven cases scored as suspicious formalignancy, and 22 of the 31 samples scored by cytology as inconclusiveor negative.

Example 6: Detection of Aneuploidy in Patients with Cancer ThroughAmplification of Long Interspersed Nucleotide Elements (LINEs)

This Example describes a new approach for amplicon-based aneuploidydetection. This approach, called WALDO forWithin-Sample-AneupLoidy-DetectiOn (WALDO), employs supervised machinelearning to detect the small changes in multiple chromosome arms thatare often present in cancers. It is shown herein that WALDO can beapplied to identify chromosome arm gains or losses with improvedsensitivity and equivalent specificity compared to previous approaches.Furthermore, machine learning can be incorporated to make genome-wideaneuploidy calls, in which samples are classified according to theiraneuploidy status. This Example reports WALDO results on thousands ofsamples, including tissues of ten different tumor types as well asliquid biopsies of plasma from cancer patients. When two samples areavailable for comparison, WALDO can be used to assess geneticrelatedness or to find somatic mutations within the LINEs. Thus, thisapproach can be used to provide an estimate of somatic mutation load,evaluate carcinogen signatures, and detect microsatellite instability(FIG. 1).

Materials and Methods Samples

A total of 1,678 tumors were evaluated in this study (see below).

Number of Samples Number of Including Matched Mutation Sample SourceSample Type Samples Replicates Normal Data Peripheral Normal 176 677 N/ANo white-blood- cell (WBC) Tumor Breast Invasive Carcinoma (BRCA) 45 45No No Tumor Colon Adenocarcinoma and Rectum 536 536 No No Adenocarcinoma(COAD; COADREAD) Tumor Colorectal Adenoma 32 32 N/A No Tumor EsophagealCarcinoma (ESCA) 42 42 No No Tumor Head and Neck Squamous Cell 96 96 NoNo Carcinoma (HNSC) Tumor Liver Heptaocellular Carcinoma 56 56 No No(LIHC) Tumor Ovarian Serous Cystadenocarcinoma 157 157 No No (OV) TumorPancreatic Adenocarcinoma (PAAD) 345 345 No No Tumor StomaticAdenocarcinoma (STAD) 28 28 No No Tumor Uterine Corpus Endometrial 296296 No No Carcinoma (UCEC) Tumor (Cell Mismatch Repair DeficientColorectal 6 6 Yes No Line) Carcinoma Plasma Normal 402 566 N/A NoPlasma Pancreatic Adenocarcinoma (PAAD) 547 547 No Yes Plasma BreastInvasive Carcinoma (BRCA) 28 28 No Yes Plasma Colon Adenocarcinoma andRectum 167 167 No Yes Adenocarcinoma (COAD; COADREAD) Plasma EsophagealCarcinoma (ESCA) 17 17 No Yes Plasma Liver Heptaocellular Carcinoma 5454 No Yes (LIHC) Plasma Stomatic Adenocarcinoma (STAD) 16 16 No YesPlasma Ovarian Serous Cystadenocarcionma 14 14 No Yes (OV) Plasma Lung113 113 No Yes

The number of cancers of each histopathologic subtype are listed in theAppendices. The tumors were formalized fixed and paraffin-embedded(FFPE). In all cases, DNA was purified using QIAsymphony (cat #937255).Peripheral white blood cells (WBCs) were purified from the blood of 176healthy individuals. Plasma was purified from 566 healthy individualsand 982 patients with cancer. DNA was purified from WBCs and plasmausing Qiagen kit numbers (cat #1091063) and (cat #937255) respectively.The majority of the plasma samples used in this study has beenindependently evaluated for mutations in one of twelve commonly mutatedgenes. The fraction of mutant alleles in these plasma samples was usedas an estimate of their neoplastic cell content. All individualsparticipating in the study provided written informed consent afterapproval by the institutional review boards of the hospitals at whichthey were collected.

Fast-SeqS

For each DNA sample evaluated, FAST-SeqS was used to amplifyapproximately 38,000 amplicons with a single primer pair (Kinde et al.,2012 PloS ONE 7:e41162). Massively parallel sequencing was performed onIllumina instruments (HiSeq 2500, HiSeq 4000, or MiSeq). Duringamplification, degenerate bases at the 5′ end of the primer were used asmolecular barcodes to uniquely label each DNA template molecule asdescribed elsewhere (see, e.g., Kinde et al., 2011 Proceedings of theNational Academy of Sciences 108:9530-9535). This ensured that each DNAtemplate molecule was counted only once. In all instances in this paper,the term “reads” refers to uniquely identified reads. Depending on theexperiment, each read was sequenced between 1 and 20 times. For each WBCand tumor DNA sample, 100,000 to 25 million reads were used foranalysis. For each plasma DNA sample, 100,000 to 15 million reads wereused. Replicates of normal DNA were included in every sequencingexperiment and used to evaluate stochastic and experimental variability.

Sample Alignment and Genomic Interval Grouping

Bowtie2 was used to align reads to human reference genome assemblyGRC37. 37,669 exact matches (33,844 excluding the sex chromosomes) tothe reference genome were identified. These exact matches allowedinclusion of common polymorphisms. The polymorphisms included 24,720single nucleotide polymorphisms (SNPs) and 1,500 insertion and deletion(indel) polymorphisms, with minor allele frequencies were >1% in the1000 Genomes database (Consortium 2012 Nature 491:56-65).

In light of experimental and stochastic variation, the number of readsthat mapped to each genomic region of any euploid sample was expected tobe variable. To minimize this variability, clusters of 500-kb genomicintervals with similar read depth across all chromosomes in multipleeuploid samples were identified. This step permitted estimation of theexpected variability in read depth in a sample when no aneuploidy waspresent. Genomic intervals smaller and larger than 500 kb were tested,and it was found that 500 kb yielded reasonable performance in theassays described below at reasonable computational expense.

Clustering of the 500-kb genomic intervals was performed as follows.Each test sample was matched to euploid samples that had similaramplicon sizes. This was done because smaller amplicons will beover-represented in the amplicons generated from DNA that is of smallsize prior to amplification. The size of the amplicons generated byFastSeqS range from 100 to 140 bp (Kinde et al., 2012 PloS ONE7:e41162). The size of plasma DNA is 140 to 180 bp (Diehl et al., 2005)Proceedings of the National Academy of Sciences of the United States ofAmerica 102:16368-16373; Chan et al., 2004 Clinical chemistry 50:88-92;Jahr et al., 2001 Cancer research 61:1659-1665; and Giacona et al., 1998Pancreas 17:89-97), so the largest LINE amplicons will be substantiallyunderrepresented in plasma DNA compared to WBC DNA, for example. It wasfound that seven euploid samples was sufficient for comparison to anytest sample; using more than seven euploid samples did not substantiallyincrease performance. The seven euploid samples were derived from acollection of 677 WBC or 566 plasma DNA from normal individuals,collectively termed the “euploid reference set”. For each test sample p,the seven normal samples with the smallest Euclidean distance top wereselected, defined as D(p,q)=√{square root over (Σ_(n)(q_(n)−p_(n))²)}where, p_(n) and q_(n) are the fraction of amplicons of size n insamples p and q, and the sum is over all amplicon sizes in the twosamples. Before calculating the Euclidian distances between the testsamples from the samples in the euploid reference set, the followingamplicons were excluded: (i) Using maximum likelihood estimates, theamplicons were ranked by variance among the seven euploid samples andthe top 1% excluded. (ii), any amplicons with <10 reads in one samplebut >50 reads in any of the other six samples were removed. In eachsample, the 500-kb genomic intervals were scaled by subtracting the meanand dividing by the standard deviation of reads in each sample.

The scaled 500-kb genomic intervals were then clustered across the sevenselected normal samples in the following way. First, each 500-kb genomicinterval i was assigned to a primary cluster C_(i). Next, the reads ingenomic interval i across all samples was compared to the average numberof read in the seven samples in all other genomic intervals i′ thatoccurred on the remaining 21 autosomal chromosomes. Insignificantresults (paired t-test p>0.05, f-test p>0.05) were tested for during thesearch for similarity. If the average number of reads in genomicinterval i′ was not significantly different from the number of reads ingenomic interval i, it was added to cluster C_(i). This process wasrepeated for each of the 4361 genomic intervals, yielding 4361 clusters.Every interval i belonged to its primary cluster but the same intervalalso belonged to an average of 176 other clusters (range of 100 to 252clusters among 190 representative samples). The number of uniqueclusters was less than the 4361 because some clusters were composed ofthe same 500-kb genomic intervals. The number of unique clusters wastypically 4310 to 4330 among 190 representative samples. Clusterscontained an average of approximately two hundred 500-kb genomicintervals (see FIG. 45). Scaled reads were not randomly distributed (seeFIG. 46A). However, the distribution of scaled reads within the ˜200genomic intervals in each cluster followed an approximately normaldistribution (example in FIG. 46B-C).

Identifying Chromosome Arm Gains or Losses in a Test Sample

WALDO used the seven euploid samples described above only to defineclusters of genomic intervals with similar amplification properties. Thestatistical tests for aneuploidy in WALDO were based on the readdistributions within the test sample and independently of the readdistributions in any euploid sample. For a test sample, maximumlikelihood was used to estimate the means μ and variances σ² of thegenomic intervals in each of the 4,361 clusters defined by the seveneuploid samples that were chosen to match it on the basis of ampliconlength. The robustness of these estimates was improved by iterativelyremoving outlying genomic intervals within the test sample from theclusters. Clusters containing fewer than 10 genomic intervals were notincluded in the analysis. For each cluster, any 500-kb genomic intervalmeeting the criteria min(2*CDF(μ, σ_(i) ²), 2*(1−CDF(μ, σ_(i) ²))<0.01was removed from all clusters. Next, the μ and σ² parameters of eachcluster were re-estimated by maximum likelihood. The two steps wererepeated until no outlying genomic intervals remained. The statisticalsignificance of the total reads was then estimated from all 500-kbgenomic intervals on the arm. Because sums of normally distributedrandom variables are also normally distributed random variables, thecalculation was straightforward (see FIG. 47). For each chromosome arm,Σ₁ ^(I)R_(i)˜N(Σ₁ ^(I)μ_(i), Σ₁ ^(I)σ_(i) ²) was calculated, where Ri isthe scaled reads and I is the number of clusters on the arm. Z-scoreswere produced using the quantile function 1−CDF(Σ₁ ^(I)μ_(i), Σ₁^(I)σ_(i) ²). Positive Z-scores>α represented gains and negativeZ-scores<−α represented losses, where a was the selected significancethreshold.

Arm Level Allelic Imbalance

Common polymorphisms from 1000 Genomes (24,720 single nucleotide and1,500 indels, MAF>1%) were used as candidate heterozygous sites. Foreach of the 677 normal samples, polymorphic sites were identified thatcould be confidently called as heterozygous and diploid. Polymorphismswere defined as those with variant-allele frequencies (VAF)(0.4<VAF<0.6), where VAF=#non-reference reads/total reads. VAFs weremodeled at these sites as random variables taken from a normaldistribution with μ=0.5; the variance σ² was estimated by maximumlikelihood as a function of read depth (FIG. 48). To determine whetherthe alleles on a chromosome arm in a test sample were unbalanced, thesubset of polymorphic sites was identified at which both alleles werepresent and in which the sum of the reads on both alleles was >25. Theobserved VAF was then compared with the normal distribution, using theexpected variance for the observed read depth, yielding a two-sidedP-value. All p-values on a chromosome arm were Z-transformed andcombined with a weighted Stouffer's method, with the observed read depthat each site used as its weight. The formula used for this calculationwas

${\sim \frac{\sum\limits_{i = 1}^{k}{w_{i}Z_{i}}}{\sqrt{\sum\limits_{i = 1}^{k}w_{i}^{2}}}},$

where w_(i) is UID depth at variant i, Z, is the Z-score of variant i,and k is the number of variants observed on the chromosome arm. Achromosome arm was scored as having an allelic imbalance if theresulting Z score was greater than the selected statistical significancethreshold α (one-sided test).

Generation of Synthetic Aneuploid Samples

Data from 63 presumably euploid samples was selected, each containing atleast 9 million reads, and each derived from the DNA of normal WBCs.Synthetic aneuploid samples were created by adding (or subtracting)reads from several chromosome arms to the reads from these normal DNAsamples. Reads from 1, 5, 10, 15, 20, or 25 randomly selected chromosomearms were added to or subtracted from each sample. The additions andsubtractions were designed to represent neoplastic cell fractionsranging from 0.5% to 10% and resulted in synthetic samples containingexactly nine million reads. The reads from each chromosome arm was addedor subtracted uniformly. For example, when five chromosome arms thatwere lost were modeled, each was lost to the identical degree and we didnot incorporate tumor heterogeneity into the model. Furthermore,synthetic samples containing two or more of the same extra chromosomearms were not created, such as synthetic cells containing 4 copies ofchromosome 3p. This simplified approach did not comprehensively coverall biologically plausible aneuploidy events. However, limiting thepossible combinations of altered arms made sample generationcomputationally tractable, and the resulting support vector machineworked well in practice.

The synthetically generated samples in which reads from only a singlechromosome arm were added or subtracted enabled us to estimate theperformance of WALDO when only a single chromosome arm of interest wasgained or lost. The synthetic set in which 5-25 chromosome arms werealtered permitted assessment of the performance of WALDO in typicalsamples derived from cancers. As shown in FIG. 37, most cancers havegains or losses of multiple chromosomes. The algorithms used to generatethe synthetic samples are shown as pseudocode in FIG. 49 and FIG. 50.

Genome-Wide Aneuploidy Detection

A two-class support vector machine (SVM; Cortes 1995 Machine learning20:273-297) was trained to discriminate between euploid samples and thesynthetic samples in which the reads from 5 to 25 chromosome arms wereadded or subtracted. The training set contained 677 WBC negative samples(presumably euploid WBCs containing 3 million-15 million reads) and 3150positive samples, all synthetic as described above. SVM training wasdone with the e1071 package in R, using radial basis kernel and defaultparameters (Meyer et al., 2015 R package version:1.6-3). Each sample had39 Z-score features, representing chromosome arm gains and losses. 677synthetic samples were randomly sampled so that the sizes of thenegative and positive classes were equivalent, and this was repeated tentimes. Each sample to be classified was scored by all ten SVMs, and theten scores were averaged to yield a final score.

The number of reads from the data on experimental samples can varywidely, particularly when the samples are derived from sources withlimited amounts of DNA such as plasma. Samples with low reads cangenerate artificially high SVM scores if read depth is not taken intoaccount. Read depth was therefore controlled for by modeling the changein SVM scores as a function of read depth in the normal samples. Inparticular, each of the 63 WBC euploid samples was randomly down-sampledto yield ten replicates of lower read euploid samples of read depthranging from 100,000 to 9 million. All down-sampled euploid samples werescored using the 10 SVMs were and the scores were averaged. Thisprocedure yielded 630 SVM scores for the down-sampled euploid samples ateach read depth. All scores were converted to ratios by finding thesample at each read depth with the minimum SVM score and dividing allscores at the same depth by that value. The average ratio r at eachdepth decreased monotonically as a function of increasing read depth(FIG. 51). The relation between read depth and SVM score was modeledusing the following equation (A=−7.076*10{circumflex over ( )}−7 andB=−1.946*10{circumflex over ( )}−1). Raw SVM scores were corrected bydividing by the ratio r, using the formula log

$\left( {1 - \frac{1}{r}} \right) = {{Ax} + {B.}}$

To score a sample as aneuploid, it was first determined whether anysingle chromosome arm in it was lost or gained in a statisticallysignificant manner. A statistically significant gain of a singlechromosome arm was defined as one whose Z-score was >4^(σ) above themaximum Z-score observed in the 677 normal WBC samples. Similarly, astatistically significant loss of a single chromosome arm was defined asone whose Z-score was <−4^(σ) below the minimum Z-score observed in the677 normal WBC samples. Allelic imbalance based on SNPs was defined fora chromosome arm whose Z-score was Z-score was >4^(σ) above the maximumZ-score observed in the 677 normal WBC samples. Only samples in which nosingle chromosome arm was gained or lost when defined in this way weresubjected to SVM analysis. The rationale for this process was that theSVM was designed to identify samples with large numbers of chromosomearm gains or losses but relatively low neoplastic cell fractions. TheSVM was not designed to detect aneuploidy in samples with neoplasticcell fractions>10%, which were easily identified through evaluation oftheir Z-scores and comparison to the 677 normal samples as described inthe first part of this paragraph.

Somatic Sequence Mutations and Microsatellite Instability (MSI)

When matched normal samples were available, it was attempted to detectsomatic single base substitution (SBS), insertion and deletion (indel)mutations based on LINE amplicon sequences and alignments. In suchcases, the molecular barcoding approach for error reduction was used.For SBS, only amplicons that have at least 200 reads and 50 uniquemolecular barcodes were considered. For indels, reads that were observedin at least two clusters on the sequencing instrument were considered.The SBS mutations were identified by directly comparing amplicons fromthe test sample with amplicons from the matched normal, and did notrequire any alignment to the reference genome. Amplicons with fewer than50 reads in the matched normal sample were excluded. A somatic SBS wasdefined as one in which at least five reads from the test samplediffered from any normal read by exactly one nucleotide substitution.

Indels were called in a similar way. Amplicons from the test sample andmatched normal sample to were first aligned the reference genome (GRc37)with Bowtie2 (Langmead 2012 Nature Methods 9:357-359). A somatic indelwas defined as one in which at least ten reads from the test samplediffered from any normal read by virtue of the same insertion ordeletion.

Microsatellite instability in a test sample was determined by countingthe number of somatic indels in mononucleotide tracts of >3 nucleotides.There were 17,488 of these mononucleotide tracts in the LINE ampliconsthat were studied. It was expected that somatic indels in monotractswould be rare in a normal sample. Therefore, the null distribution ofcounts could be modeled as Poisson (λ=1), where λ is the mean number ofsomatic indels in a monotract in a normal sample. A sample was called asharboring MSI if the number of somatic indels was statisticallysignificant. To evaluate how often normal samples would be scored as MSIusing this process, the total reads in normal samples was randomly splitinto two equal partitions. The first partition was used as the referencesample and the second partition was used as a test sample.

Sample Matching

To compare one sample to another, amplicons were first aligned to thereference genome GRC37 with Bowtie2. The 1000 Genomes commonpolymorphisms were used to identify the genotypes at 26,220 sites ineach sample. Each polymorphic site was called as “0” (homozygousreference, >0.95 reads matching reference allele, minimum of ten reads),“1” (heterozygous, 0.05-0.95 UIDS matching either reference or alternateallele, minimum of ten reads of each allele), or “2” (homozygousalternate, >0.95 UIDs matching alternate allele, minimum of ten reads).Concordance was defined as the number of matched polymorphic sites thatwere identical in both samples (i.e., were both “0”, both “1”, or both“2”) divided by the total number of genotypes that had adequate coveragein both samples. Two samples were considered a match if concordancewas >0.98 and at least 15,000 amplicons had adequate coverage.

TCGA Somatic Copy Number Alterations

The most recent Cancer Genome Atlas Level 4 somatic copy numberalteration files from Firehose were downloaded(Aggregate_AnalysisFeatures.Level_4.2016 012800.0.0), from GISTICanalysis (Beroukhim et al., 2007 Proceedings of the National Academy ofSciences 104:20007-20012) of Affymetrix SNP6 arrays. 9 TCGA tumor typeswere selected that matched our cohorts of primary tumor samples (Breastinvasive carcinoma (BRCA), colon adenocarcinoma (COAD), colon or rectaladenocarcinoma (COADREAD), esophageal carcinoma (ESCA), head and necksquamous cell carcinoma (HNSC), liver hepatocellular carcinoma (LIHC),ovarian serous cystadenocarcinoma (OV), pancreatic adenocarcinoma(PAAD), stomach adenocarcinoma (STAD), and uterine corpus endometrialcarcinoma (UCEC)). A total of 6276 samples were available (Breastinvasive carcinoma (BRCA):1081, colon adenocarcinoma, colon or rectaladenocarcinoma (COAD;COADREAD): 2755, esophageal carcinoma (ESCA): 185,head and neck squamous cell carcinoma (HNSC): 523, liver hepatocellularcarcinoma (LIHC): 371, ovarian serous cystadenocarcinoma (OV): 580,pancreatic adenocarcinoma (PAAD): 185, stomach adenocarcinoma (STAD):442, uterine corpus endometrial carcinoma (UCEC): 540). The dataconsisted of GISTIC logy ratios representing gains or losses of eachchromosome arm (Mermel et al., 2011 Genome biology 12:R41). Ratios>0.2or <0.2 were considered gains or losses (Laddha et al., 2014 Molecularcancer research 12:485-490).

Comparison with a Prior Technique for Detecting Single Chromosome ArmAlterations

The fraction of reads that mapped to each chromosome arm in each of the677 WBC samples, and their averages and standard deviations, werecalculated (Kinde et al., 2011 Proceedings of the National Academy ofSciences 108:9530-9535). The score for each arm was computed as:z_(i,chrN)=(chrN_(i)−μ_(chrN))/σ_(chrN), where chrN_(i) represents thenormalized read counts for that chromosome arm and μ_(chrN) and σ_(chrN)represent the mean and standard deviation of the normalized readscounts.

Results Statistical Principles Underlying WALDO

Unlike most conventional approaches for assessing copy number changes,WALDO does not compare normalized read counts from each chromosome armin a test sample to the fraction of reads in each chromosome arm inother samples. Such conventional comparisons are subject to batcheffects and other artifacts associated with variables that are difficultto control. To evaluate whole genome sequencing data, aneuploidy wasdetected by comparing the read counts of LINEs within 4361 genomicintervals each containing 500-kb of sequence. The read counts within the500-kb genomic intervals within a sample were only compared to the readcounts of other genomic intervals within the same sample—hence the“Within-Sample” designation in WALDO.

In euploid samples, the number of LINE reads within each 500-kb genomicinterval should track with the number of reads in certain other genomicregions. Genomic intervals that track together do so because theamplicons within them amplify to similar extents. Here, such genomicregions that track together are called “clusters”. It is possibleidentify clusters from sequencing data on euploid samples. In a testsample, it is determined whether the number of reads in each genomicinterval in each pre-defined cluster is within the expected bound of theother clusters from that same sample. If the reads within a genomicinterval are outside the statistically expected bound, and there aremany such outsiders on the same chromosome arm, then that chromosome armis classified as aneuploid. The statistical basis of this test isdescribed in the Materials and Methods. In brief, while the number ofreads at each LINE is not randomly distributed across the genome, thedistribution of scaled reads within each cluster is approximatelyNormal. A convenient property of Normal distributions is that the sum ofmultiple Normal distributions is also a Normal distribution. It is thuspossible to compute the theoretical mean and variance of the summedreads on each chromosome arm simply by summing the means and variancesof all the clusters represented on that chromosome arm.

WALDO also employs several other innovations that make it applicable tothe analysis of PCR-generated amplicons from clinical samples. One ofthese innovations is controlling amplification bias stemming from thestrong dependence of the data on the size of the initial template.Another is the use of a Support Vector Machine (SVM) to enable thedetection of aneuploidy in samples containing low neoplastic fractions.The conceptual and statistical bases for WALDO are detailed in theMaterials and Methods section herein.

Evaluation of Chromosome Arm Gains and Losses in Primary Tumor Samples

WALDO was first used to study chromosome arm gains and losses in 1,677primary tumor samples from ten cancer types. One of the outputs of WALDOis a z-score for each of the 39 non-acrocentric arms on the autosomalchromosomes. The z-scores for each of these chromosome arms in each ofthe primary tumor samples evaluated in this study are provided in Table35. These results were compared with those obtained by The Cancer GenomeAtlas (TCGA) on independent samples of the same tumor types (Zack etal., 2013 Nature genetics 45:1134-1140; and Beroukhim et al., 2007Proceedings of the National Academy of Sciences 104:20007-20012). Thefraction of tumor samples having a gain or a loss in each chromosome armwas identified in our data and in TCGA, considering all tumor typestogether and each tumor type individually. As shown in the top half ofFIG. 37, the fraction of samples in all cancer types scored as a gain byWALDO or as a gain by TCGA's algorithm (GISTIC) is shown for eachchromosome arm, and the fraction of samples with a loss is shown in thebottom half. The correlations between arm-level gains scored in thisstudy and those in TCGA are shown in FIG. 39A (R²=0=45) and arm-levellosses are shown in FIG. 39B (R²=0.39). Considering that the sampleswere from completely different patients, the specific chromosome armsgained and lost in both datasets were remarkably similar. The chromosomearms with the most gains were 1q, 3q, 7p, 7q, 8q, and 20q and relativelyfew losses were observed on these arms. Those with the most losses were4p, 4q, 8p, and 18q and relatively few gains were observed on thesearms. The arms with fewest gains or losses were 10p, 16p, 19p, and 19q.

Similarly high correlations were observed for many of the specific tumortypes in those cases in which a sufficient number of cancers wereavailable for comparison (see below, and FIG. 40).

Gain Loss WALDO GISTIC Correlation Correlation Samples Samples BRCA0.629 0.436 89 181 COAD; COADREAD 0.582 0.428 536 2755 ESCA 0.043 0.0742 185 HNSC 0.537 0.344 96 523 LIHC 0.64 0.287 56 371 OV 0.067 0.123 157580 PAAD 0.384 0.702 345 185 STAD 0.552 0.555 28 442 UCEC 0.325 0.165296 540

The highest correlations were for pancreatic adenocarcinomas and livercancers (R²=0.70 and R²=0.64, respectively). An interesting outcome ofthis analysis was the large number of chromosome arms that wereaneuploid in the great majority of cancer cases. The median number ofchromosome arms that were lost or gained per cancer was 14, withinterquartile range of 5 to 22. This large number was used for thedevelopment of the Support Vector Machine described below.

32 benign tumors of the colon (colorectal adenomas) were also evaluated.It was found that 25 of them displayed gains or losses of chromosomes.The median number of chromosome arms that were lost or gained per benigntumor was 4, with interquartile range of 1 to 9.75. No benign tumorshave yet been studied by TCGA, so comparison was not possible. However,comparison to colorectal cancers showed that the benign tumors had manyfewer chromosome arm changes than observed in cancers. Additionally, thechromosome arms altered in the adenomas overlapped with those in thecancers, and the directionality of the changes (gains vs. losses) waspreserved.

WALDO also allows determination of allelic imbalances based on the SNPswith the LINEs that are concomitantly sequenced. This provides a totallyindependent measure of chromosome arm changes than provided by thenumber of reads across the 500-kb genomic intervals. Note thatmeasurements of allelic imbalance represent the ratios between thenumber of reads of the reference allele vs. those of the variant allele.This ratio will be the same whether the chromosome arm containing thereference allele is gained or the arm containing the variant allele islost. Nevertheless, without wishing to be bound by theory, one wouldexpect that there would be a strong relationship between the chromosomesexhibiting allelic imbalances and those exhibiting either gains orlosses in the same tumor. It was found that 63% of chromosome arms withallelic imbalance also had a significant gain or loss at the samechromosome arm. Other uses of the SNPs within the LINEs are describedherein.

Next, the sensitivity and specificity of WALDO to call single chromosomearm gains or losses was compared (see Materials and Methods). Bothmethods were applied to LINE amplicon sequencing data from 677 normalperipheral white blood cell (WBC) samples, with each WBC sampleindependently amplified and sequenced to an average depth of 9.5M reads.This experimental data was augmented by 24,570 synthetic samples withsingle chromosome alterations (see Materials and Methods). Sensitivitywas computed as the total number of correctly identified altered armsdivided by the total number of altered arms in the synthetic samples.Specificity was computed as 1 minus the total number of incorrectlycalled altered arms divided by the total number of normal arms in theexperimental data from the normal WBC samples. For both WALDO and theprevious method, three significance thresholds (±1.96, ±3.0, ±5.0) andthree neoplastic cell fractions (1%, 5%, 10%) were considered. For allthresholds and neoplastic cell fractions, WALDO had higher specificityand sensitivity (see below, and Table 34).

Di- WALDO WALDO Z Score Z Score lution Threshold Sensitivity SpecificitySensitivity Specificity 0.010 1.96 > or < −1.96 0.221 0.969 0.144 0.9520.010 3 > or < −3 0.031 0.999 0.020 0.995 0.010 5 > or < −5 0.000 1.0000.000 1.000 0.050 1.96 > or <−1.96 0.969 0.969 0.899 0.952 0.050 3 > or< −3 0.917 0.999 0.748 0.995 0.050 5 > or < −5 0.671 1.000 0.443 1.0000.100 1.96 > or < −1.96 0.999 0.969 0.988 0.952 0.100 3 > or < −3 0.9950.999 0.957 0.995 0.100 5 > or < −5 0.957 1.000 0.839 1.000

To further evaluate the ability of WALDO to detect single chromosomeabnormalities, DNA from patients with trisomy 21 was also evaluated. TheDNA from individuals with trisomies were physically mixed at a ratio of2 ng of normal DNA and 0.2 ng of Trisomy 21 DNA. The mixtures werecreated to replicate typical fetal fractions in noninvasive prenataltesting (approximately 10%). Using polymorphisms in the LINE amplicons,the trisomy admixture rate of the samples (range 7.7%-10.4%) wasestimated. Using a z threshold of 2.5, it was found that as few as 2Mreads could detect trisomy 21 at fetal fractions typically observed(Sensitivity 95%). 16 normal WBC were then sampled at various readdepths. At 2M reads using the same threshold, the specificity was 100%.Sensitivities and specificities at other read depths and otheradmixtures of trisomy 21 samples are summarized in FIG. 41.

Aneuploidy Detection in Samples with Low Fractions of Neoplastic CellDNA

Many potential applications of aneuploidy detection in cancer involveidentifying a relatively small fraction of DNA from neoplastic cellswithin a large pool of DNA derived from normal cells. One notableapplication is liquid biopsy, i.e., the evaluation of bodily fluids suchas urine, saliva, cyst fluid, or sputum for evidence of cancer. Giventhat aneuploidy is a general feature of cancers of virtually all types(see FIG. 37), detecting aneuploidy could be used for this purpose.

To employ WALDO for liquid biopsies, a 2-stage approach was used. Thefirst employed a search for individual chromosome arm gains or losses orallelic imbalance, as described above. Simulations with synthetic DNAshowed that this approach could detect an individual chromosome arm gainor loss with sensitivities>90% at specificities>99% when the fraction ofDNA contributed by the neoplastic cells was >5% of the total DNA. Todetect aneuploidy in samples with lower fractions of neoplastic cellDNA, that fact that the median number of chromosome arm gains or lossesper tumor was high (Kinde et al., 2012 PloS ONE 7:e41162) was exploited.A variety of approaches to distinguish samples containing low fractionsof neoplastic DNA with multiple chromosome abnormalities from euploidsamples was therefore considered. These approaches included counting thenumber of significant arms, combining scores of the most significantarms, and summing squared window-based Z-scores. Based on syntheticsamples, it was found that the optimum approach was obtained with aSupport Vector Machine (among many machine learning algorithms tested).The Support Vector Machine training was designed to be generallyapplicable to any cancer type rather than based on patterns of gains andlosses typical of specific cancer types. With synthetic samples, theSupport Vector Machine could detect aneuploidy in 78% of samples with aneoplastic cell fraction of 1% at a specificity of 99% as determined bycross validation. This Support Vector Machine-based algorithm wastherefore incorporated into WALDO for the evaluation of clinical sampleswith low neoplastic composition (see FIG. 38).

WALDO was then used to attempt to evaluate aneuploidy in plasma samplesfrom 961 cancer patients and 566 healthy individuals (see Materials andMethods). Cancers of 8 different types were evaluated (see Table 36).The neoplastic cell fraction of each cancer sample was considered to bethe mutant allele fractions determined from deep sequencing data.Samples were divided into those with neoplastic cell fractions>1% (122samples), between 0.5% and 1% (96 samples) and <0.5% (738 samples).Sensitivity was defined as the proportion of cancer patient samplesscored as aneuploid, while specificity was defined as 1 minus thefraction of healthy patient samples scored as aneuploid. Receiveroperating curves (ROC) are shown in FIG. 38 for these three ranges ofneoplastic fractions. At stringent specificity (99%), aneuploidy wasidentified in 42% of samples with neoplastic cell fractions>1% (see FIG.38A). As expected, sensitivity decreased with decreasing neoplastic cellfractions (see FIG. 38B, 38C). At 99% specificity, WALDO detectedaneuploidy in 24% of samples with neoplastic cell fractions of 0.5 to 1%and in 19% of samples with neoplastic cell fractions of 0 to 0.5%. Thespecific cancer type of the patient was not highly correlated withpositive aneuploidy calls. However, the number of template moleculesthat were assessed did correlate with sensitivity.

In plasma samples with higher neoplastic cell content, it was possibleto determine which chromosome arms were gained or lost. Among 558 of theplasma samples that had a paired primary tumor, 188 samples had asignificant chromosome arm gain or loss and 54% had a concordant gain orloss in the primary tumor. In samples with low neoplastic content, noneof the individual arms were gained or lost at statistically significantlevels but the Support Vector Machine component of WALDO was presumablyable to pick out small deviations in multiple chromosome arms thatdistinguished them from euploid samples.

Sample Matching

DNA profiling with short tandem repeats is a well-established forensictechnique that is now routinely used. Carefully curated SNP panels havealso been developed to ensure sample identity, such as between tumor andnormal specimens from the same patients (Kidd et al., 2006 Forensicscience international 164:20-32; and Pengelly et al., 2013 Genomemedicine 5:89). The LINEs amplified in FAST-SeqS contain 26,220 commonpolymorphisms, including variants detected in >1% in 1000 Genomes(Consortium 2012 Nature 491:56-65). These polymorphisms theoreticallyprovide a powerful way to profile DNA samples evaluated for aneuploidywithout any additional work or cost. To determine whether suchidentification was possible in practice, a measure of concordancebetween any two samples was designed (see Materials and Methods). Thisto measure concordance was then used in replicates of 176 normal WBCsamples to one another, using ˜5 replicates per sample, for a total of676 WBC samples. The input to WALDO was 676 samples, without specifyingthe sample name, so there were a total of 456,976 (676×676) possiblematches. WALDO correctly matched all replicate samples with highconcordance (>99.9%), without any false matching. Next, this protocolwas performed on 970 plasma samples and 1,684 tumor samples. The 558plasma samples should match the corresponding primary tumor samples andno other matches should be observed. This produced 7,038,409((970+1,683) x (970+1,683)) possible matches. Nearly all the 2653samples matched as expected, i.e., only to themselves or to thecorresponding primary tumors, with concordance>99.8%. However, twoplasma samples were found that did not match to the expected primarytumors and 12 plasma samples that matched to other plasma samples thatwere purportedly derived from different donors. In all these “mismatchedcases”, the FastSeqS data indicated high concordances (>99.8%). Themismatches were therefore most likely a result of mislabeling of thesamples and illustrated the utility of sample identity check with WALDO.

Mutation Load, Carcinogenic Signatures, Microsatellite Instability

When two samples, a normal and a cancer, are available from the samepatient, LINE mutations that are in one sample but not the other canconceivably be discerned. For this application, molecular barcoding toreduce sequencing errors (see, e.g., Kinde et al., 2011 Proceedings ofthe National Academy of Sciences 108:9530-9535) can be used, as isachieved through the experimental and bioinformatics components of WALDO(see Materials and Methods).

To determine whether somatic mutation detection was feasible, tenurothelial carcinomas of the upper urinary tract (UTUCs) and normaltissues from the same patients were evaluated. These samples hadpreviously been analyzed by exomic sequencing (Hoang et al., 2013Science translational medicine 5:197ra102-197ra102). For each tumorsample, the number of somatic mutations were counted and the spectrum ofsingle base substitutions (SBS) (A->T, A->C, etc.). It was found thatthe number of SBS in LINEs was highly correlated with the number of SBSin the exomes of these tumors (R²=0.98, p<2.6*10⁻⁸). The spectrum ofmutations in the LINEs was similarly correlated with the spectrum ofmutations in exonic sequences (R²=0.95, p<1.8*10⁻⁶). Noticeably, six ofthese tumors were from patients exposed to aristolochic acid and thepathognomonic signature (A->T, T->A) of this mutagen was prominent inthese six tumors (see FIG. 42, FIG. 43, and FIG. 44).

The LINEs assessed by WALDO harbor 17,488 mononucleotide tracts of >3nucleotides. Because mononucleotide tracts are particularly sensitive todefects in mismatch repair, it was determined whether WALDO could beused to assess mismatch repair deficiency. For this purpose, the numberof indels in the 17,488 LINE mononucleotide tracts were assessed. It wasfound the number of indels in six mismatch repair-deficient colorectalcancers averaged 35 and ranged from 10 to 67. Normal tissues from thesepatients harbored only zero or one indel, and the difference between thecancers and normal tissues was highly significant (p<7.2*10⁻⁴).

Example 7: Genome-Wide Quantification of Rare Mutations by BottleneckSequencing

The accumulation of random somatic mutations in the nuclear andmitochondrial genome over time underlies fundamental theories ofcarcinogenesis, neurodegeneration, and aging (see, e.g., Stratton etal., 2009 Nature 458:719-724; Kennedy et al., 2012 Mech Ageing Dev133:118-126; and Vijg et al., 2014 Current opinion in genetics &development 26:141-149). Direct observation of these rare mutations inthe human body with age therefore has the potential to enhance ourunderstanding of human disease. Currently, no simple high-throughputmethod exists to directly and systematically quantify somatic mutationalload in normal, non-diseased human tissues at a genome-wide level.Next-generation DNA sequencing (NGS) technologies address this issue,but their sequencing error rate limits the detection of rare mutations(see, e.g., Albertini et al., 1990 Annu Rev Genet 24:305-326; and Coleet al., 1994 Mutat Res 304:33-105).

This Example describes a Bottleneck Sequencing System (BotSeqS)technology designed to accurately detect rare point mutations in anymolecularly-barcoded library in a completely unbiased fashion. BotSeqS,a next-generation sequencing method that simultaneously quantifies raresomatic point mutations across the mitochondrial and nuclear genomes.BotSeqS combines molecular barcoding with a simple dilution stepimmediately prior to library amplification. In this Example, BotSeqS isused to show age and tissue-dependent accumulations of rare mutationsand demonstrate that somatic mutational burden in normal tissues canvary by several orders of magnitude, depending on biologic andenvironmental factors. This Example also shows major differences betweenthe mutational patterns of the mitochondrial and nuclear genomes innormal tissues. Lastly, the mutation spectra of normal tissues weredifferent from each other, but similar to those of the cancers thatarose in them. This technology can provide insights into the number andnature of genetic alterations in normal tissues and can be used toaddress a variety of fundamental questions about the genomes of diseasedtissues.

Materials and Methods Human Tissue Samples

Normal, non-diseased tissues for this study were acquired from fivedifferent sources (Table 43). For COL229 to COL237 and SIN230, colon orduodenum was obtained from consented patients at the Johns HopkinsHospital with the approval of its Institutional Review Board. For COL373to COL375 and BRA01 to BRA09, flash frozen, post-mortem colon and brainwas requested from the NIH NeuroBioBank (neurobiobank.nih.gov), with therequest being approved and fulfilled by University of Maryland Brain andTissue Bank (Baltimore, Md.) and University of Miami Brain EndowmentBank (Miami, Fla.). For KID034 to KID038, flash frozen, post-mortemkidney cortex blocks (200 mg) were purchased from Windber ResearchInstitute (Windber, Pa.). COL238 and COL239 were as reported elsewhere(see, e.g., Parsons et al., 1995 Science 268:738-740; Hamilton et al.,1995 The New England journal of medicine 332:839-847; and De Vos et al.,2004 American journal of human genetics 74:954-964). SA_117, SA_118,SA_119, AA_105, AA_124, and AA_126 were as reported elsewhere (see,e.g., Hoang et al., 2013 Science translational medicine 5:197ra102). Theinitial rationale for the sample size for colon and brain was to acquireat least three individuals in each age group in order to understand theaverage trend of somatic mutational patterns for each age group. Agegroups for colon and brain were selected based on human body growth andmaintenance: early body development at <10 years, fully grown youngadult body at −20-40 years, and old, maintained adult body at >90 years.For colon, one tissue from the young child age group (SIN230) was laterdetermined to be duodenum, leaving only two individuals representing theyoung child age group for colon epithelium. For normal kidney, criteriafor kidney acquisition were an age-matched and non-smoking control groupfor the kidneys of smokers and aristolochic acid-exposed samples. Allnormal kidney controls were Caucasian and therefore less likely tooriginate from a high risk AA-exposed population (e.g. Asia). From thesame kidney tissue source, three aliquots of flash frozen, post-mortemnormal kidney from a five month old individual were available astechnical replicates and to further test an age-trend for non-carcinogenexposed normal kidneys.

Preparation of Illumina Y-Adapter-Ligated Molecules

Genomic DNA (34 ng to 1 μg) in 55 μL TE buffer was fragmented usingBioRuptor (Diagenode) at high intensity for 15 s on and 90 s off, using7 cycles at 3° C. After random fragmentation, Illumina Y-adapters wereligated to the DNA fragments using TruSeq DNA PCR-Free kit (Illumina)according to a standard low DNA input Illumina protocol with selectionfor 350 bp insert sizes. This resulted in adapter-ligated molecules in atotal volume of 20 μL.

Dilution of Y-Adapter-Ligated Molecules

Five ten-fold serial dilutions were performed in 96-well PCR platesstarting with 2 μL of adapter-ligated molecules (prior to PCR) in 18 μLof dilution buffer (TE containing 1 ng/μL pBlueScript). Samples weremixed by gently pipetting with a multichannel pipette. Two μL of eachsample was then transferred into 18 μL of fresh dilution buffer using amultichannel pipette. The mixing and transferring was repeated for atotal of five serial dilutions. Only 2 μL of each dilution (1/10 totalvolume) was used as template for each PCR. A 10³-fold dilution wasaccomplished as follows: (i) use of 2 μL of the total 20 μL ofadapter-ligated molecules (10-fold dilution); (ii) mixing 2 μL ofadapter-ligated molecules with dilution buffer in a total volume of 20μL (10-fold dilution); and (iii) use of 2 μL of diluted adapter-ligatedmolecules from the total 20 μL volume in the PCR reaction (10-folddilution, see below). The five serial dilutions resulted in finaldilution factors of 10³, 10⁴, 10⁵, 10⁶, and 10⁷.

PCR Amplification of Diluted Y-Adapter-Ligated Molecules

Custom HPLC-purified PCR primers (IDT), TS-PCR Oligol(5′-AATGATACGGCGACCACCGAG*A; SEQ ID NO:808) and TS-PCR Oligo2(5′-CAAGCAGAAGACGGCATACGA*G; SEQ ID NO:809), were designed with onephosphorothioated bond (*) at the 3′ end. PCR was performed in 50 μLtotal volume with 0.5 μM TS-PCR Oligol, 0.5 μM TS-PCR Oligo2, Q5 2×HotStart High-Fidelity Master Mix (NEB) at 1× final concentration, and 2μL of diluted adapter-ligated molecules as template. PCR was performedin Thermo HyBaid PCR Express HBPX Thermal Cycler. The following PCRprogram was used: 1) 98° C. for 30 s 2) 98° C. for 10 s, 69° C. for 30s, 72° C. for 30 s for 18 cycles, and 3) 72° for 2 min. PCR reactionswere purified with AMPure XP (Agilent) at 1.0× bead-to-sample ratioaccording to the manufacturer's protocol.

MiSeq Run and Analysis

A subset of amplified BotSeqS sequencing libraries was evaluated on anIllumina MiSeq instrument (˜5 M clusters passed filter per library) toempirically deduce the optimal dilution. The “optimal dilution” wasdetermined to result in 5 to 10 PCR duplicates per molecule when scaledto ½ lane of a HiSeq instrument (˜70 M clusters passed filter perlibrary in Rapid Run mode). For example, for an input of 500 ng gDNAinto the TruSeq PCR-free library prep (selecting for 350 bp insertsize), amplified libraries from the 10⁴-, 10⁵-, 10⁶-fold dilutions weresequenced at 2×50 bp depth on MiSeq. Three different well-barcodedsamples (which were also molecularly barcoded) were multiplexed in oneMiSeq lane to test three dilutions of each sample. The .bam output fileswere uploaded into Galaxy, and Picard's Estimate Library Complexity Tool(Galaxy Tool Version 1.56.0) was executed using the default parameters.Optimal dilutions showed distributions ranging from one to four membersper family with singletons comprising ˜60-80% of total counts. Ingeneral, with an input of 500 ng of gDNA into the TruSeq PCR-freelibrary prep, the 10⁵-fold dilution yielded ˜10 members per family on asubsequent HiSeq run used for BotSeqS. From our sequencing data, weestimate the average number of high quality clusters required toidentify one rare mutation in colonic tissues was (1) 30 M in a normalchild, (2) 12 M in a normal young adult, and (3) 5.8 M in a normal oldadult.

Whole Genome Sequencing

Thirty-two whole-genome sequencing (WGS) libraries were generated fromthe 34 individuals in this study. In the remaining two individualswithout WGS, COL238 and COL239, Sanger sequence was performed to excludeclonal variants in the BotSeqS data. Of the final 20 μL ofadapter-ligated molecules used to prepare BotSeqS libraries (prior todilution), 10 μL was used to amplify a library for whole-genomesequencing using TruSeq PCR Primer Cocktail (Illumina) and TruSeq PCRMaster Mix (Illumina) according to TruSeq PCR protocol. PCR reactionswere purified with AMPure XP (Agilent) at 1.0× bead-to-sample ratioaccording to the manufacturer's instructions. The libraries were PEsequenced 2×100 bp on Illumina HiSeq at >30× coverage.

Spike-in Sensitivity Experiment

Two DNA mixtures were prepared from the DNA of normal spleen samplesPEN93 and PEN95. Whole genome sequence data was available from these twosamples (see, e.g., Jiao et al., 2011 Science 331:1199-1203) and SNPs inPEN93 that were not present in PEN95 could be identified. Both mixturescontained the same amount of PEN95 DNA, but the low spike-in mixcontained only 10% of the PEN93 DNA contained in the high spike-in mix.BotSeqS libraries from these samples were first analyzed using thenormal BotSeqS pipeline to minimize clonal and germline mutations.Indeed only a total of two mutations were detected among the twolibraries; these two mutations likely represented rare mutations in thePEN95 sample, and suggest a mutation frequency of ˜8×10⁻⁷ mutation/bp.Next, the data were processed through the BotSeqS pipeline withoutfiltering out mutations that were present in dbSNP (build 130 and 142).Seven PEN93-specific SNPs in the low spike-in and 89 PEN93-specific SNPsin the high spike-in mixtures were identified. After normalizing for thenumber of sequenced bases, the “mutation frequency” (number ofPEN93-specific SNPs/bp) was 2.71×10⁻⁶ for the low spike-in and 2.01×10⁻⁵for the high spike-in samples. The difference between the low spike-inand the high spike-in was 7.4-fold, within the range expected from the10-fold dilution given the relatively low number of mutations identifiedin the low spike-in sample.

Characterization of BotSeqS Specificity

As one measure of specificity, we identified rare mutations as usualexcept that we used mutations that were present in only one strandrather than in both. Specifically, mutations were present in ≥90% of theWatson family members and the reference sequence was present in >90% ofthe Crick family members, or vice versa, but satisfied our othercriteria for being “rare”. False Watson and Crick pairings were thencreated, where the Watson strand had overlapping but differentcoordinates than the Crick strand, and vice versa, to determine if theycontained the same mutation by chance. BotSeqS works by having lowcoverage throughout the genome, generated through the bottleneckdilution step, and precluded this analysis in the nuclear DNA. Instead,mtDNA were used because of the multiple copies of mtDNA per cell. Thecoverage of mtDNA with BotSeqS is much higher than that of nuclear DNAand facilitated the identification of overlapping molecules. 30 BotSeqScontrol libraries were processed this way and a total of 146 mtDNAmutations were identified present in one strand only. Using thisdataset, each sample was then searched for overlapping molecules andidentified 27 examples. None of the 27 false Watson and Crick pairsshared the same artifactual mutation.

Non-random shearing could produce another type of artifact, falselysuggesting that the Watson and Crick strands of a family were actuallyderived from two different molecules that coincidentally had the samegenomic coordinate. To test for such artifacts, Watson and Crick familypairs were identified that contained the variant in the Watson strandand the reference sequence in the Crick strand, or vice versa, but thistime included heterozygous germline variants rather than just the rarevariants, and in nuclear DNA rather than in mtDNA. There are many moreheterozygous variants in nuclear DNA than in mtDNA because the mtDNA isderived only from the oocyte. The discordances of interest could ariseas a result of mispairing of a Watson strand with a Crick strand derivedfrom a different template molecule—i.e., non-random shearing.Alternatively, discordances could result from an amplification error inone of the two strands during an early PCR cycle. Using our WGS data, wefirst identified 8,535,891 nuclear heterozygous variants observed amongthe 30 DNA samples used for the control BotSeqS libraries (median of268,180 variants per library with range 121,851 to 529,922, with thesame common variants present in many libraries). From the 8,535,891nuclear heterozygous variants, we identified a total of 3,960,818families (median of 123,134 families per library with range 65,832 to222,135) for which both strands could be evaluated. Of these, 3,960,807families had the concordant sequence at the variant position in bothstrands; only 11 heterozygous variants were discordant (i.e., thevariant was present in ≥90% of the Watson family members and thereference sequence was present in ≥90% of the Crick family members, orvice versa). The rate of discordant germline heterozygous variants wasthus 2.78×10⁻⁶ (11 out of 3,960,818) per bp. This rate is compatiblewith the known error rate of high fidelity DNA polymerases and couldeasily represent an amplification error that occurred in one of the twostrands during the first PCR cycle, so represents an overestimate ofshearing artifacts. Furthermore, it is important to note that BotSeqSeliminates such amplification errors by requiring mutations to beobserved on both strands. Because BotSeqS requires mutations to beobserved on both strands, the actual false positive rate can beestimated to be ˜(⅓)(2.78×10⁻⁶)(2.78×10⁻⁶)=2.58×10⁻¹².

Generation of BotSeqS Change and Molecule Tables

Sequence alignments and variant calling were performed with the Illuminasecondary analysis package (CASAVA 1.8) using ELANDv2 matching to theGRCh37/hg19 human reference genome. High-quality reads were selected forfurther analysis only if they satisfied all of the following criteria:(i) passed chastity filter, (ii) read mapped in a proper pair, (iii)<5mismatches to reference sequence, and (iv) perfect identity to referencesequence within the first and last five bases of each read. Sequencingreads were grouped into families based on identical paired-endendogenous barcodes. The members of a family were further subdividedinto the two possible sequencing orientations to determine the number ofWatson and Crick-derived family members. Watson and Crick families hadidentical genomic coordinates with each end sequenced in opposite reads.Quality scores of identical changes within a family were calculated asthe average among the family members. The output for each BotSeqSlibrary was two annotated tables of changes and template molecules(i.e., families).

Selection of High Quality Changes and Molecules

Custom algorithms were written in Microsoft SQL Server Management Studioto query the changes and molecules tables for each BotSeqS library.Selection criteria are detailed in Table 44-Table 48. In general,selection was based on quality, clonality, and mappability of singlebase pair substitutions. For example, it is known that one of the majorsources of errors facing all short read alignment and variant callersare artifacts that arise when variants map to repetitive regions in thegenome, including low complexity regions and copy number variants (see,e.g., Li et al., 2014 Bioinformatics 30:2843-2851). The BotSeqS pipelineeliminates this universal error in a downstream step by filtering outthe genomic noise from repetitive DNA and structural variants (detailedin Table 48). Indels were excluded because they are prone to alignmentartifacts and are ˜10 times less frequent than spontaneous pointmutations. High quality single-base substitutions were defined as thosewith average quality scores (within the family) of ≥Q30 and with ≥2reads and ≥90% mutation fraction in both the Watson and Crick strands.Variants were considered to be clonal if the variant position waspresent in the WGS data from that sample or observed in >1 templatemolecules (i.e., both strands of more than one UID). Any positionspresent in dbSNP130 or dbSNP142 were also excluded. It was noticed thatthe dbSNP filtering drastically minimized recurrent sequencing ormapping artifacts and highly mutable regions. For example, homopolymertracts (≥8 bp) are mutation hotspots that flood the mutation list. Itwas observed that nearly all were filtered out with dbSNP142. Finally,families that harbored>1 mutation were excluded as possible mappingartifacts.

Calculation of Mutation Frequency

Mutation frequencies were determined for each BotSeqS library (see Table51) by dividing the total number of rare mutations by the total bpsequenced. The total bp sequenced was defined by number of families×2×read length of each family. The average length of the libraries was ˜500bp such that the 100 bp paired-end reads were unlikely to overlap. Onlytemplates with perfect identity to the reference sequence in the firstand last 5 bp of every read were considered. The reads were furthertrimmed by excluding cycle 6 and 7 to ensure quality. Therefore, theactual read length was 88 bases (100−7−5=88). For the samples from whichtechnical replicate BotSeqS libraries were generated, the averagemutation frequency of the technical replicates was considered themutation frequency for the sample.

Validation of Somatic Mutations

All rare mutations from the nuclear and mtDNA genome passed visualinspection of the sequencing reads. For rare nuclear mutations, Sangersequencing was performed on a representative set (514 out of 876mutations). Of these, 514 of 514 (100%) were confirmed to be invisibleby Sanger sequencing (excluding the COL238 and COL239 samples that didnot have a matched WGS). This demonstrated that these mutations wereneither present in the germline nor present in a highly clonal fashion.Mutations confirmed to be absent upon Sanger sequencing are indicated inTable 50.

Comparison to Cancer Genomes

Nineteen MAF files representing nuclear somatic mutations from 19 TCGAtumor types were downloaded at synapse.org/#!Synapse:syn1729383 (see,e.g., Kandoth et al., 2013 Nature 502:333-339). From the TCGA data, onlysingle-base substitutions were considered and somatic mutations fromultra-mutated tumors were excluded. Mitochondrial DNA somatic mutationsfrom colorectal and renal tumors were derived from supplementary file 2of Ju et al. (2014 eLife 3).

Statistics

For study design, no prior power analysis or randomization was performedbecause the variance was initially unknown. The goal of the study was tofind major, biologically meaningful differences between the cohorts. Tofind major differences, sample sizes can be small. Even with the smallsample size, however, no violations of the assumptions of the tests weredetected, including violations about the homogeneity of variances.T-test and ANOVA analyses were performed using GraphPad Prism 5.0f.Fisher's exact test was performed using R version 3.2.2. Principalcomponent analysis was performed in R. All analyzed samples werereported in the manuscript.

Results Principles Underlying BotSeqS

The principal feature of BotSeqS is the dilution of any type of asequencing library prior to PCR amplification. This dilution creates abottleneck and permits an efficient, random sampling of double-strandedtemplate molecules with a minimal amount of sequencing. Rare mutations,which would normally be masked by an abundance of wild-type sequences inconventional libraries, account for much more of the signal at thecorresponding genomic position in a bottlenecked library. Dilution alsoincreases the likelihood that both the “Watson” and “Crick” strands of aDNA molecule will be sequenced redundantly, a feature critical for thehigh accuracy of BotSeqS and the relatively small amount of sequencingrequired to implement it. The presence of the same rare mutation on bothstrands can substantially decrease artifacts and increase specificity(see, e.g., Schmitt et al., 2012 PNAS USA 109:14508-14513). Finally, therandom nature of dilution allows DNA molecules from both nuclear andmitochondrial genomes to be assessed from one library.

Generation of BotSegS Libraries

A standard Illumina TruSeq PCR-Free kit was used to generate 44 BotSeqSlibraries from the normal tissues of 34 individuals (Table 43). Thisincluded nine individuals with one or two technical replicates. Inaddition, 10 of our 12 cohorts had more than one biological replicate,each containing two to six individuals.

The preparation of BotSeqS libraries starts with the random shearing ofgenomic DNA (FIG. 52). This fragments the genomes into variably-sizedDNA molecules, each possessing unique end coordinates called endogenousbarcodes (see, e.g., Kinde et al., 2011 PNAS USA 108:9530-9535).Following ligation of standard sequencing adapters to the DNA molecules,the library is diluted to reduce the number of molecules in thepopulation. To identify the correct dilution factor, a ten-fold dilutionseries was assessed on a MiSeq instrument (FIG. 53). After dilution, PCRamplification of the library generates multiple copies (duplicates) ofeach DNA molecule. The endogenous barcodes enables the grouping ofsequencing reads into families, also known as UIDs, for uniqueidentifiers (see, e.g., Kinde et al., 2011 PNAS USA 108:9530-9535); eachfamily represents the PCR-derived progeny of a single-stranded templateand each member of a family represents the sequence from a singlecluster on the Illumina instrument. In the following, we consider theWatson strand to be the sequence derived from the first read of thesequencing instrument (Illumina adapter P5) and the Crick strand to bethe sequence derived from the second read (Illumina adapter P7) of eachmember of the family (FIG. 52). To be considered a potential mutation,BotSeqS required that the identical sequence change be observed in ≥90%of the Watson and in ≥90% of the Crick family members and that eachfamily be composed of at least two members. BotSeqS libraries wereanalyzed using an Illumina HiSeq 2500 instrument on rapid run mode withpaired-end reads of 100 bases each. A median of 70 million (M) clustersper library passed the standard Illumina quality filters (range 37 to188 M clusters per library; Table 43).

BotSeqS Data Processing Pipeline

The goal of the BotSeqS pipeline was to accurately identify rare,somatic point mutations and to calculate the frequency of thesemutations in the sample. To process the data for this purpose, rawsequencing data were input into Illumina's secondary analysis package(CASAVA 1.8) with ELANDv2 mapping to GRCh37/hg19 human reference genome.The BotSeqS pipeline begins by selecting high quality reads for analysis(see Materials and Methods). The data are then organized into two tablesfor each BotSeqS library: (i) a “change” table listing all differencesfrom the reference sequence and (ii) a unique molecule table listing allfamilies. Importantly, each table contains strand information; almosthalf (median 45%, range 8% to 62%) of the unique molecules from eachBotSeqS library had both the Watson and Crick strands represented in thedataset, ensuring specificity in the subsequent mutation analysis.Moreover, most BotSeqS libraries (37 of 44) had a median number offamily members between 5 and 20 (FIG. 56), further demonstrating thatthe libraries underwent successful bottlenecking.

To identify rare, somatic mutations, it was necessary to eliminategermline and clonal variants from the BotSeqS data (we defined clonal asthose present in both strands of more than one template molecule). Weperformed whole genome sequencing (WGS) of the same DNA sample or thesame libraries that had been diluted for BotSeqS for 32 of the 34individuals in this study (Table 43). For the remaining two individuals(COL238 and COL239), Sanger sequencing was performed to eliminate clonalvariants, demonstrating that WGS was not necessary for BotSeqS. The vastmajority (median 92%, range 88-94%) of variants were found to begermline, easily identifiable from the matched WGS dataset. In additionto clonality, we eliminated potential artifacts by considering onlywell-mapped positions and by using other filters (Table 44-Table 48 andMaterials and Methods). The requirement for mutations to be present onboth strands was indeed necessary because, in the absence of thisfilter, there was a large number of G>T transversions (FIG. 57), knownto represent artifacts in NGS library preparations (see, e.g., Costelloet al., 2013 Nucleic acids research 41:e67). A “spike-in” validationexperiment was further performed by mixing one individual's normal DNA(PEN93) into another individual's normal DNA (PEN95) at two differentratios. Using BotSeqS, it was possible to detect PEN93-specific SNPs inboth samples with a 7.4-fold difference in frequency between the low andhigh spike-ins, within the expected error of the intended 10-folddifference (see Materials and Methods).

From the 44 BotSeqS libraries, a total of 666 and 876 rare somatic pointmutations were identified in mtDNA and nuclear DNA, respectively (Table49 and Table 50). All rare mutations passed visual inspection and asubset was Sanger-sequenced to confirm that the mutations were notgermline or highly prevalent in the samples evaluated (see Materials andMethods). As expected from previous studies, point mutation frequenciesof mtDNA (1.40±1.29×10⁻⁵ mutation/bp, mean±s.d.) were significantlyhigher than those of nuclear DNA (5.23±3.47×10⁻⁷) in 25 controlindividuals (two-tailed t-test, P<0.0001; Table 51). The specificity ofBotSeqS was further determined using discordant germline heterozygouscalls to estimate a false positive rate of 2.58×10⁻¹² (see Materials andMethods).

Mutation Frequencies Vary with DNA Repair Capacity and CarcinogenExposure

It was first asked if BotSeqS can detect the elevated levels ofmutations in the normal tissues of mismatch repair deficientindividuals. Individuals with biallelic inactivating germline mutationsin mismatch repair machinery show higher levels of mutation in bothnormal and tumor tissues (see, e.g., Parsons et al., 1995 Science268:738-740; and Shlien et al., 2015 Nature genetics 47:257-262).Therefore, DNA was tested from normal colon epithelium of individuals(COL238 and COL239) with biallelic germline inactivating mutations inthe Post-Meiotic Segregation 2 (PMS2) gene. Using BotSeqS, it was foundthat the average mutation frequency of nuclear DNA in these two siblings(6.63±3.47×10⁻⁵ mutations/bp; ages 16 and 18) was significantly higherthan that in similarly aged individuals (5.13±1.73×10⁻⁷ for COL235,COL236, COL237, COL374; average age 24) with proficient mismatch repair(two-tailed t-test, P<0.05, FIG. 53a ). This 129-fold increase innuclear mutation frequency was associated with a significant differencein the nuclear mutational spectrum between PMS2^(+/+) and PMS2^(−/−)cohorts (Fisher's exact test, P=0.04, FIG. 53b ).

It was also tested if BotSeqS could identify a high number of mutationsin the normal tissues of individuals exposed to environmentalcarcinogens. Genome-wide sequencing of upper tract urothelial carcinomaswas previously performed, representing a cancer type associated withexposure to aristolochic acid (AA) or smoking (see, e.g., Hoang et al.,2013 Science translational medicine 5:197ra102). Mutagens in tobaccosmoke as well as AA are metabolized to form DNA-adducts in the normalkidney cortrex (see, e.g., Hoang et al., 2013 Science translationalmedicine 5:197ra102; and Randerath et al., 1989 Journal of the NationalCancer Institute 81:341-347). Four age-matched normal kidney corticesfrom individuals (KID034, KID035, KID036, KID037; average age 64 years)without known exposure to tobacco smoke or to AA were compared with thenormal kidney cortex of three heavy smokers (SA_117, SA_118, SA_119;average age 65 years) as well as with three individuals who had beenexposed to AA (AA_105, AA_124, AA_126; average age 79 years). Thenuclear point mutation frequencies in smokers and AA-exposed kidneyswere significantly higher, by 27- and 36-fold, respectively, than in thenon-exposed controls (one-way ANOVA with Bonferroni multiple comparisonpost-test, P<0.0001 for AA and P<0.001 for smoking) (FIG. 53a ). Thisincreased number of mutations in the nuclear genome was associated witha significantly altered nuclear mutational spectrum (Fisher's exact testwith Bonferroni multiple comparison correction, P=2.58×10⁻⁸ for AA andP=1.51×10⁻¹⁵ for smoking) (FIG. 53b ). Interestingly, the mtDNA pointmutation frequencies and spectra between the non-exposed and exposedgroups were not significantly different, despite the dramatic differencein their nuclear genomes (FIG. 53a, b ).

Rare Mutations Accumulate with Age

Many lines of evidence indicate that the human body accumulates randommutations with age. BotSeqS was designed to directly measure differencessuch as these and we tested whether rare point mutation frequencies inthe DNA of three normal human tissues were dependent upon age. Normalcolonic epithelium from 11 individuals showed mutation frequencies thatsignificantly increased with age, by an average of 30-fold in mtDNA and6.1-fold in nuclear DNA, over 91 years (see below and FIG. 54; one-wayANOVA with Bonferroni multiple comparison post-test, P<0.001 for both).Similarly, mutation frequencies increased by an average of 19-fold inmtDNA and 6.5-fold in nuclear DNA over 64 years in normal kidneycortices. The mutation frequencies in brain frontal cortex alsosignificantly increased with age, albeit more slowly, by 7.3-fold inmtDNA and 5.7-fold in nuclear DNA over 90 years (one-way ANOVA withBonferroni multiple comparison post-test, P<0.001 for mtDNA and P<0.05for nuclear).

Summary of Rare Mutation Frequencies in Normal Human Tissues

Average Lifespan Normal Number of Mutation Frequency (×10⁻⁷mutations/bp) Lifespan Fold- Genome Tissue Individuals Young Child YoungAdult Old Adult (years) Difference mtDNA Brain 9 18 ± 7 43 ± 6  131 ±18  89.5 7.3 Kidney 5 15 nd 277 ± 64  63.8 18.5 Colon 11 12 ± 17 112 ±43  365 ± 103 90.8 30.4 Nuclear Brain 9 1.1 ± 0.3 2.2 ± 1.1 6.3 ± 2.389.5 5.7 Kidney 5 1.2 nd 7.8 ± 1.5 63.8 6.5 Colon 11 1.8 ± 0.5 5.5 ± 1.6 11 ± 1.5 90.8 6.1 nd, not determined

Within the dataset, point mutation frequencies in brain versus colonictissues in three different age groups (children<10 years; adults between20 and 40 years; and old adults≥90 years) could be directly compared.Interestingly, the nuclear mutation frequency in colon was notsignificantly different from that of the brain in children(1.81±0.45×10⁻⁷ in colon vs. 1.06±0.27×10⁻⁷ in brain, two-way ANOVA withBonferroni multiple comparison post-test, P>0.05). However, the mutationfrequency in the colon was significantly higher than that of the brainin young adults (5.51±1.62×10⁻⁷ in colon vs. 2.16±1.11×10⁻⁷ in brain,two-way ANOVA with Bonferroni multiple comparison post-test, P<0.05) aswell as in old adults (1.10±0.15×10⁻⁶ in colon vs. 6.29±2.31×10⁻⁷ inbrain, two-way ANOVA with Bonferroni multiple comparison post-test,P<0.01) (FIG. 58). No significant differences were found between themtDNA mutation frequency of the colon versus that of brain in relativelyyoung individuals (children or young adults). However, the mtDNAmutation frequency in the colon was significantly higher than that ofthe brain in old individuals (3.65±1.03×10⁻⁵ in colon vs. 1.31±0.18×10⁻⁵in brain, two-way ANOVA with Bonferroni multiple comparison post-test,P<0.0001) (FIG. 58).

The Mutational Patterns in mtDNA are Very Different from Those ofNuclear DNA

The spectra of the rare point mutations in each normal tissue studiedwere examined. Mutations in mtDNA were dominated by transitions (97% incolon, 89% in kidney, and 91% in brain) with a heavy strand bias, asexpected from previous studies¹² (FIG. 54 and Table 49). The ratio oftransitions to transversions was strikingly different in mtDNA (averageof 15.3) compared to nuclear DNA (average of 1.1) in all three tissues.

To further assess the differences in mutation frequencies between thetwo genomes, we calculated the ratio between mtDNA-to-nuclear mutationfrequencies for each individual (Table 51). Point mutation frequenciesin the mtDNA were on average 24.5-fold higher than the nuclear genome innormal tissues (control cohort, FIG. 59). In patients with exposurehistories or DNA repair defects, the ratios were significantly smallerdue to the concomitantly greater number of nuclear (but notmitochondrial) DNA mutations in such individuals compared to those fromthe control cohort (one-way ANOVA with Bonferroni multiple comparisonpost-test, P<0.05) (FIG. 59).

Mutational Spectra are Tissue-Specific

Though rare mutations in mtDNA are dominated by transitions, there arestill tissue-specific mtDNA differences that can be appreciated from thepie charts in FIG. 3. For example, mitochondrial C:G to T:A transitionswere more prominent, and A:T to G:C transitions less prominent, innormal colon (54% and 42%, respectively) and brain (51% and 40%,respectively) compared to normal kidney tissues (36% and 53%,respectively). The mutation spectra in the nuclear DNA of all threetissues were much more diverse. For example, C:G to T:A transitionspredominated in normal colon (44% in colon compared to 22% in kidney and29% in brain), while normal kidney and brain harbored a proportionatelygreater fraction of A:T to G:C transitions (25% in kidney and 19% inbrain compared to 15% in colon) as well as A:T to C:G transversions (12%in kidney and 16% in brain compared to 5% in colon). Moreover, A:T toT:A transversions were more frequent in kidney (16%) compared to colon(6%) and brain (6%). Pairwise comparisons of the mutational spectrawithin each genome revealed significant differences between thesubstitution pattern of kidney and colon (Fisher's exact test withBonferroni multiple comparison correction, P=0.0029 in mtDNA andP=0.0312 in nuclear DNA).

The spectra of the rare mutations found in normal kidney and colontissues were compared to the clonal DNA mutations in cancers derivedfrom the cells of these organs, using publically available data for thelatter (see, e.g., Ju et al., 2014 eLife 3; and Kandoth et al., 2013Nature 502:333-339). Brain frontal cortex was excluded in this analysisbecause it was not clear what tumor type should be used for comparison.To search for similarities and differences among normal and tumormutational spectra, principal component analysis was performed on thenuclear and mtDNA spectra derived from the data on normal kidney cortex,normal colon epithelium, clear cell renal carcinoma, and colorectalcarcinoma. It was found that the spectra of the rare mutations in normalcolon and kidney tissues were very similar to those of the correspondingcancer type (FIG. 60).

Example 8: Safe Sequencing System

Genetic mutations underlie many aspects of life and death including, forexample, evolution and disease, respectively (see, e.g., Luria et al.,1943 Genetics 28:491-511; Roach et al., 2010 Science 328:636-639; Durbinet al., 2010 Nature 467:1061-1073; Shibata, 2011 Carcinogenesis32:123-128; McMahon et al., 2007 N Engl J Med 356:2614-2621; Eastman etal., 1998 J Infect Dis 177:557-564; Chiu et al., 2008 Proc Natl Acad SciUSA 105:20458-20463; and Fan et al., 2008 Proc Natl Acad Sci USA105:16266-16271). Detection of such mutations, particularly at a stageprior to their becoming dominant in the population, will likely beessential to optimize diagnoses and/or therapy. For example, inneoplastic diseases, which are all driven by somatic mutations, theapplications of rare mutant detection are manifold; they can be used tohelp identify residual disease at surgical margins or in lymph nodes, tofollow the course of therapy when assessed in plasma, and perhaps toidentify patients with early, surgically curable disease when evaluatedin stool, sputum, plasma, and other bodily fluids (see, e.g., Hogue etal., 2003 Cancer Res 63:5723-5726; Thunnissen et al., 2003 J Clin Pathol56:805-810; and Diehl et al., 2008 Gastroenterology 135:489-498).

This Example describes a “Safe-SeqS” (Safe-Sequencing System) to achievea very high level of accuracy and sensitivity from sequence data.Safe-SeqS can be used to assess the fidelity of a polymerase, theaccuracy of in vitro synthesized nucleic acid synthesis, and theprevalence of mutations in nuclear or mitochondrial nucleic acids ofnormal cells. Safe-SeqS can also be used to detect and/or quantifymosaicsm and somatic mutations. See, also, WO 2012/142213, incorporatedherein by reference in its entirety.

Materials and Methods Endogenous UIDs

Genomic DNA from human pancreas or cultured lymphoblastoid cells wasprepared using Qiagen kits. The pancreas DNA was used for the captureexperiment and the lymphoblastoid cells were used for the inverse PCRexperiment. DNA was quantified by optical absorbance and with qPCR. DNAwas fragmented to an average size of ˜200 bp by acoustic shearing(Covaris), then end-repaired, A-tailed, and ligated to Y-shaped adaptersaccording to standard Illumina protocols. The ends of each templatemolecule provide endogenous UIDs corresponding to their chromosomalpositions. After PCR-mediated amplification of the libraries with primersequences within the adapters, DNA was captured with a filter containing2,594 nt corresponding to six cancer genes. After capture, 18 cycles ofPCR were performed to ensure sufficient amounts of template forsequencing on an Illumina GA IIx instrument.

For the inverse PCR experiments (FIG. 65), we ligated custom adapters(IDT, Table 57) instead of standard Y-shaped Illumina adapters tosheared cellular DNA. These adapters retained the region complementaryto the universal sequencing primer but lacked the grafting sequencesrequired for hybridization to the Illumina GA IIx flow cell. The ligatedDNA was diluted into 96 wells and the DNA in each column of 8 wells wasamplified with a unique forward primer containing one of 12 indexsequences at its 5′ end plus a standard reverse primer (Table 57).Amplifications were performed using Phusion HotStart I (NEB) in 50 uLreactions containing lx Phusion HF buffer, 0.5 mM dNTPs, 0.5 uM eachforward and reverse primer (both 5′-phosphorylated), and 1U of Phusionpolymerase. The following cycling conditions were used: one cycle of 98°C. for 30s; and 16 cycles of 98° C. for 10s, 65° C. for 30s, and 72° C.for 30s. All 96 reactions were pooled and then purified using a QiagenMinElute PCR Purification Kit (cat. no. 28004) and a QIAquick GelExtraction kit (cat. no. 28704). To prepare the circular templatesnecessary for inverse PCR, DNA was diluted to ˜1 ng/uL and ligated withT4 DNA Ligase (Enzymatics) for 30 minutes at room temperature in a 600uL reaction containing 1×T4 DNA Ligation Buffer and 18,000U of T4 DNALigase. The ligation reaction was purified using a Qiagen MinElute kit.Inverse PCR was performed using Phusion Hot Start I on 90 ng of circulartemplate distributed in twelve 50 uL reactions, each containing 1×Phusion HF Buffer, 0.25 mM dNTPs, 0.5 uM each of KRAS forward andreverse primers (Table 57) and 1U of Phusion polymerase. TheKRAS-specific primers both contained grafting sequences forhybridization to the Illumina GA IIx flow cell (Table 57). The followingcycling conditions were used: one cycle of 98° C. for 2 minutes; and 37cycles of 98° C. for 10 seconds, 61° C. for 15 seconds, and 72° C. for10 seconds. The final purification was performed with a NucleoSpinExtract II kit (Macherey-Nagel) and eluted in 20 uL NE Buffer. Theresulting DNA fragments contained UIDs composed of three sequences: twoendogenous ones, represented by the two ends of the original shearedfragments plus the exogenous sequence introduced during the indexingamplification. As 12 exogenous sequences were used, this increased thenumber of distinct UIDs by 12-fold over that obtained without exogenousUIDs. This number could easily be increased by using a greater number ofdistinct primers.

Exogenous UIDs

Genomic DNA from normal human colonic mucosae or blood lymphocytes wasprepared using Qiagen kits. The DNA from colonic mucosae was used forthe experiments on CTNNB1 and mitochondrial DNA, while the lymphocyteDNA was used for the experiments on CTNNB1 and on polymerase fidelity.DNA was quantified with Digital PCR using primers that amplifiedsingle-copy genes from human cells (Analysis of Polymerase Fidelity andCTNNB1), qPCR (mitochondrial DNA), or by optical absorbance(oligonucleotides). Each strand of each template molecule was encodedwith a 12 or 14 base UID using two cycles of amplicon-specific PCR, asdescribed in the text and FIG. 63. The amplicon-specific primers bothcontained universal tag sequences at their 5′ ends for a lateramplification step. The UIDs constituted 12 or 14 random nucleotidesequences appended to the 5′ end of the forward amplicon-specificprimers (Table 57). These primers can generate 16.8 and 268 milliondistinct UIDs, respectively. It is important that the number of distinctUIDs greatly exceed the number of original template molecules tominimize the probability that two different original templates acquiredthe same UID. The UID assignment PCR cycles included Phusion Hot StartII (NEB) in a 45 uL reaction containing 1× Phusion HF buffer, 0.25 mMdNTPs, 0.5 uM each forward (containing 12-14 Ns) and reverse primers,and 2U of Phusion polymerase. To keep the final templateconcentrations<1.5 ng/uL, multiple wells were used to create somelibraries. The following cycling conditions were employed: oneincubation of 98° C. for 30 seconds (to activate the Phusion Hot StartII); and two cycles of 98° C. for 10 seconds, 61° C. for 120 seconds,and 72° C. for 10 seconds. To ensure complete removal of the first roundprimers, each well was digested with 60 U of a single strand DNAspecific nuclease (Exonuclease-I; Enzymatics) at 37° C. for 1 hour.After a 5 minute heat-inactivation at 98° C., primers complementary tothe introduced universal tags (Table 57) were added to a finalconcentration of 0.5 uM each. These primers contained two terminalphosphorothioates to make them resistant to any residual Exonuclease-Iactivity. They also contained 5′ grafting sequences necessary forhybridization to the Illumina GA IIx flow cell. Finally, they containedan index sequence between the grafting sequence and the universal tagsequence. This index sequence enables the PCR products from multipledifferent individuals to be simultaneously analyzed in the same flowcell compartment of the sequencer. The following cycling conditions wereused for the subsequent 25 cycles of PCR: 98° C. for 10 seconds and 72°C. for 15 seconds. No intermediate purification steps were performed inan effort to reduce the losses of template molecules.

After the second round of amplification, wells were consolidated andpurified using a Qiagen QIAquick PCR Purification Kit (cat. no. 28104)and eluted in 50 uL EB Buffer (Qiagen). Fragments of the expected sizewere purified after agarose (mtDNA libraries) or polyacrylamide (allother libraries) gel electrophoresis. For agarose gel purification, theeight 6-uL aliquots were loaded into wells of a 2% Size Select Gel(Invitrogen) and bands of the expected size were collected in EB Bufferas specified by the manufacturer. For polyacrylamide gel purification,ten 5-uL aliquots were loaded into wells of a 10% TBE Polyacrylamide Gel(Invitrogen). Gel slices containing the fragments of interest wereexcised, crushed, and eluted as described elsewhere (see, e.g., Durbinet al., 2010 Nature 467:1061-1073).

Analysis of Phusion Polymerase Fidelity

Amplification of a fragment of human genomic DNA within the BMX (RefSeqAccession NM_203281.2) gene was first performed using the PCR conditionsdescribed above. The template was diluted so that an average of onetemplate molecule was present in every 10 wells of a 96-well PCR plate.Fifty uL PCR reactions were then performed in 1× Phusion HF buffer, 0.25mM dNTPs, 0.5 uM each forward and reverse primers (Table 57), and 2U ofPhusion polymerase. The cycling conditions were one cycle of 98° C. for30 seconds; and 19 cycles of 98° C. for 10 seconds, 61° C. for 120seconds, and 72° C. for 10 seconds. The primers were removed bydigestion with 60 U of Exonuclease-I at 3TC for 1 hour followed by a 5minute heat-inactivation at 98° C. No purification of the PCR productwas performed, either before or after Exonuclease-I digestion. Theentire contents of each well were then used as templates for theexogenous UIDs strategy described above.

Sequencing

Sequencing of all the libraries described above was performed using anIllumina GA IIx instrument as specified by the manufacturer. The totallength of the reads used for each experiment varied from 36 to 73 bases.Base-calling and sequence alignment was performed with the Elandpipeline (Illumina). Only high quality reads meeting the followingcriteria were used for subsequent analysis: (i) the first 25 basespassed the standard Illumina chastity filter; (ii) every base in theread had a quality score≥20; and (iii)<3 mismatches to expectedsequences. For the exogenous UID libraries, we additionally required theUIDs to have a quality score≥30. A relatively high frequency of errorswas noticed at the ends of the reads in the endogenous UID librariesprepared with the standard Illumina protocol, presumably introducedduring shearing or end-repair, so the first and last three bases ofthese tags were excluded from analysis.

Safe-SeqS Analysis

High quality reads were grouped into UID-families based on theirendogenous or exogenous UIDs. Only UID-families with two or more memberswere considered. Such UID-families included the vast majority (≥99%) ofthe sequencing reads. To ensure that the same data was used for bothconventional and Safe-SeqS analysis, UID-families containing only onemember were also excluded from conventional analysis. Furthermore, abase was only identified as “mutant” in conventional sequencing analysisif the same variant was identified in at least two members of at leastone UID-family (i.e., two mutations) when comparing conventionalanalysis to that of Safe-SeqS with exogenous UIDs. For comparison withSafe-SeqS with endogenous UIDs, we required at least two members of eachof two UID-families (i.e., four mutations) to identify a position as“mutant” in conventional analysis. With either endogenous or exogenousUIDs, a super-mutant was defined as a UID-family in which ≥95% ofmembers shared the identical mutation. Thus, UID-families with <20members had to be 100% identical at the mutant position, while a 5%combined replication and sequencing error rate was permitted inUID-families with more members. To determine polymerase fidelity usingSafe-SeqS, and to compare the results with previous analyses of Phusionpolymerase fidelity, it was necessary to realize that the previousanalyses would only detect mutations present in both strands of the PCRproducts (see, e.g., Shibata, 2011 Carcinogenesis 32:123-128). Thiswould be equivalent to analyzing PCR products generated with one lesscycle with Safe-SeqS, and the appropriate correction was made in Table53A. Unless otherwise specified, all values listed in the text andTables represent means and standard deviations.

Results Endogenous UIDs

UIDs, sometimes called barcodes or indexes, can be assigned to nucleicacid fragments in many ways. These include the introduction of exogenoussequences through PCR or ligation. Even more simply, randomly shearedgenomic DNA inherently contains UIDs consisting of the sequences of thetwo ends of each sheared fragment (FIG. 62 and FIG. 65). Paired-endsequencing of these fragments yields UID-families that can be analyzedas described above. To employ such endogenous UIDs in Safe-SeqS, twoseparate approaches were used: one designed to evaluate many genessimultaneously and the other designed to evaluate a single gene fragmentin depth (FIG. 62 and FIG. 65, respectively).

For the evaluation of multiple genes, standard Illumina sequencingadapters were ligated to the ends of sheared DNA fragments to produce astandard sequencing library, then captured genes of interest on a solidphase. In this experiment, a library made from the DNA of ˜15,000 normalcells was used, and 2,594 bp from six genes were targeted for capture.After excluding known single nucleotide polymorphisms, 25,563 apparentmutations, corresponding to 2.4×10⁻⁴±mutations/bp, were also identified(Table 52). Based on previous analyses of mutation rates in human cells,at least 90% of these apparent mutations were likely to representmutations introduced during template and library preparation orbase-calling errors. Note that the error rate determined here (2.4×10⁻⁴mutations/bp) is considerably lower than usually reported in experimentsusing the Illumina instrument because we used very stringent criteriafor base calling.

With Safe-SeqS analysis of the same data, it was determined that 69,505original template molecules were assessed in this experiment (i.e.,69,505 UID-families, with an average of 40 members per family, wereidentified, Table 52). All of the polymorphic variants identified byconventional analysis were also identified by Safe-SeqS. However, only 8super-mutants were observed among these families, corresponding to3.5×10⁻⁶ mutations/bp. Thus Safe-SeqS decreased the presumptivesequencing errors by at least 70-fold.

Safe-SeqS analysis can also determine which strand of a template ismutated, thus an additional criteria for calling mutations could requirethat the mutation appears in only one or in both strands of theoriginally double stranded template. Massively parallel sequencers areable to obtain sequence information from both ends of a template in twosequential reads. (This type of sequencing experiment is called a“paired end” run on the Illumina platform, but similar experiments canbe done on other sequencing platforms where they may be called byanother name.) The two strands of a double stranded template can bedifferentiated by the observed orientation of the sequences and theorder in which they appear when sequence information is obtained fromboth ends. For example, a UID strand pair could consist of the followingtwo groups of sequences when each end of a template is sequenced insequential reads: 1) A sequence in the sense orientation that begins atposition 100 of chromosome 2 in the first read followed by a sequence inthe antisense orientation that begins at position 400 of chromosome 2 inthe second read; and 2) A sequence in the antisense orientation thatbegins at position 400 of chromosome 2 in the first read followed by asequence in the sense orientation that begins at position 100 ofchromosome 2 in the second read. In the capture experiment describedabove, 42,222 of 69,505 UIDs (representing 21,111 original doublestranded molecules) in the region of interest represented UID strandpairs. These 42,222 UIDs encompassed 1,417,838 bases in the region ofinterest. When allowing a mutation to only occur within UID strand pairs(whether in one or both strands), two super-mutants were observed,yielding a mutation rate of 1.4×10⁻⁶ super-mutants/bp. When requiringthat a mutation occur in only one strand of a UID strand pair, only onesuper-mutant was observed, yielding a mutation rate of 7.1×10⁻⁷super-mutants/bp. When requiring that a mutation occur in both strandsof a UID strand pair, only one super-mutant was observed, yielding amutation rate of 7.1×10⁻⁷ super-mutants/bp. Thus, requiring thatmutations occur in only one or in both strands of templates can furtherincrease the specificity of Safe-SeqS.

A strategy employing endogenous UIDs was also used to reduce falsepositive mutations upon deep sequencing of a single region of interest.In this case, a library prepared as described above from 1,750 normalcells was used as template for inverse PCR employing primerscomplementary to a gene of interest, so the PCR products could bedirectly used for sequencing (FIG. 65). With conventional analysis, anaverage of 2.3×10⁻⁴ mutations/bp were observed, similar to that observedin the capture experiment (Table 52). Given that only 1,057 independentmolecules from normal cells were assessed in this experiment, asdetermined through Safe-SeqS analysis, all mutations observed withconventional analysis likely represented false positives (Table 52).With Safe-SeqS analysis of the same data, no super-mutants wereidentified at any position.

Exogenous UIDs

Though the results described above show that Safe-SeqS can increase thereliability of massively parallel sequencing, the number of differentmolecules that can be examined using endogenous UIDs is limited. Forfragments sheared to an average size of 150 bp (range 125-175), 36 basepaired-end sequencing can evaluate a maximum of ˜7,200 differentmolecules containing a specific mutation (2 reads×2 orientations×36bases/read×50 base variation on either end of the fragment). Inpractice, the actual number of UIDs is smaller because the shearingprocess is not entirely random.

To make more efficient use of the original templates, a Safe-SeqSstrategy was developed that employed a minimum number of enzymaticsteps. This strategy also permitted the use of degraded or damaged DNA,such as found in clinical specimens or after bisulfite-treatment for theexamination of cytosine methylation. As depicted in FIG. 63, thisstrategy employs two sets of PCR primers. The first set is synthesizedwith standard phosphoramidite precursors and contained sequencescomplementary to the gene of interest on the 3′ end and different tailsat the 5′ ends of both the forward and reverse primers. The differenttails allowed universal amplification in the next step. Finally, therewas a stretch of 12 to 14 random nucleotides between the tail and thesequence-specific nucleotides in the forward primer. The randomnucleotides form the UIDs. An equivalent way to assign UIDs tofragments, not used in this study, would employ 10,000 forward primersand 10,000 reverse primers synthesized on a microarray. Each of these20,000 primers would have gene-specific primers at their 3′-ends and oneof 10,000 specific, predetermined, non-overlapping UID sequences attheir 5′-ends, allowing for 10⁸ (i.e., [10⁴]²) possible UIDcombinations. In either case, two cycles of PCR are performed with theprimers and a high-fidelity polymerase, producing a uniquely tagged,double-stranded DNA fragment from each of the two strands of eachoriginal template molecule (FIG. 63). The residual, unused UIDassignment primers are removed by digestion with a single-strandspecific exonuclease, without further purification, and two new primersare added. Alternatively or in addition to such digestion, one can use asilica column that selectively retains larger-sized fragments or one canuse solid phase reversible immobilization (SPRI) beads under conditionsthat selectively retain larger fragments to eliminate smaller,non-specific, amplification artifacts. This purification may potentiallyhelp in reducing primer-dimer accumulation in later steps. The newprimers, complementary to the tails introduced in the UID assignmentcycles, contain grafting sequences at their 5′ ends, permittingsolid-phase amplification on the Illumina instrument, andphosphorothioate residues at their 3′ ends to make them resistant to anyremaining exonuclease. Following 25 additional cycles of PCR, theproducts are loaded on the Illumina instrument. As shown below, thisstrategy allowed us to evaluate the majority of input fragments and wasused for several illustrative experiments.

Analysis of DNA Polymerase Fidelity

Measurement of the error rates of DNA polymerases is essential for theircharacterization and dictates the situations in which these enzymes canbe used. The error rate of Phusion polymerase was measured, as thispolymerase has one of the lowest reported error frequencies of anycommercially available enzyme and therefore poses a particular challengefor an in vitro-based approach. A single human DNA template molecule,comprising a segment of an arbitrarily chosen human gene, was firstamplified through 19 rounds of PCR. The PCR products from theseamplifications, in their entirety, were used as templates for Safe-SeqSas described in FIG. 63. In seven independent experiments of this type,the number of UID-families identified by sequencing was 624,678±421,274,which is consistent with an amplification efficiency of 92±9.6% perround of PCR.

The error rate of Phusion polymerase, estimated through cloning of PCRproducts encoding β-galactosidase in plasmid vectors and transformationinto bacteria, is reported by the manufacturer to be 4.4×10⁻⁷errors/bp/PCR cycle. Even with very high stringency base-calling,conventional analysis of the Illumina sequencing data revealed anapparent error rate of 9.1×10⁻⁶ errors/bp/PCR cycle, more than an orderof magnitude higher than the reported Phusion polymerase error rate(Table 53A). In contrast, Safe-SeqS of the same data revealed an errorrate of 4.5×10⁻⁷errors/bp/PCR cycle, nearly identical to that measuredfor Phusion polymerase in biological assays (Table 53A). The vastmajority (>99%) of these errors were single base substitutions (Table54A), consistent with previous data on the mutation spectra created byother prokaryotic DNA polymerases (Tindall et al. 1988 Biochemistry27:6008-6013; de Boer et al., 1988 Genetics 118:181-191; Eckert et al.,1990 Nucleic Acids Res 18:3739-3744).

Safe-SeqS also allowed a determination of the total number of distinctmutational events and an estimation of PCR cycle in which the mutationoccurred. There were 19 cycles of PCR performed in wells containing asingle template molecule in these experiments. If a polymerase erroroccurred in cycle 19, there would be only one super-mutant produced(from the strand containing the mutation). If the error occurred incycle 18 there should be two super-mutants (derived from the mutantstrands produced in cycle 19), etc. Accordingly, the cycle in which theerror occurred is related to the number of super-mutants containing thaterror. The data from seven independent experiments demonstrate arelatively consistent number of observed total polymerase errors(2.2±1.1×10⁻⁶ distinct mutations/bp), in good agreement with theexpected number of observations from simulations (1.5±0.21×10⁻⁶ distinctmutations/bp). The data also show a highly variable timing of occurrenceof polymerase errors among experiments (Table 55). This kind ofinformation is difficult to derive using conventional analysis of thesame next-generation sequencing data, in part because of theprohibitively high apparent mutation rate noted above.

Analysis of Oligonucleotide Composition

A small number of mistakes during the synthesis of oligonucleotides fromphoshoramidite precursors are tolerable for most applications, such asroutine PCR or cloning. However, for synthetic biology, wherein manyoligonucleotides must be joined together, such mistakes present a majorobstacle to success. Clever strategies for making the gene constructionprocess more efficient have been devised (see, e.g., Kosuri et al., 2010Nat Biotechnol 28:1295-129; and Matzas et al., 2010 Nat Biotechnol28:1291-1294), but all such strategies would benefit from more accuratesynthesis of the oligonucleotides themselves. Determining the number oferrors in synthesized oligonucleotides is difficult because the fractionof oligonucleotides containing errors can be lower than the sensitivityof conventional next-generation sequencing analyses.

To determine whether Safe-SeqS could be used for this determination,standard phosphoramidite chemistry was used to synthesize anoligonucleotide containing 31 bases that were designed to be identicalto that analyzed in the polymerase fidelity experiment described above.In the synthetic oligonucleotide, the 31 bases were surrounded bysequences complementary to primers that could be used for the UIDassignment steps of Safe-SeqS (FIG. 63). By performing Safe-SeqS on300,000 oligonucleotides, it was found that there were 8.9±0.28×10⁻⁴super-mutants/bp and that these errors occurred throughout theoligonucleotides (FIG. 66A). The oligonucleotides contained a largenumber of insertion and deletion errors, representing 8.2±0.63% and25±1.5% of the total super-mutants, respectively. Importantly, both theposition and nature of the errors were highly reproducible among sevenindependent replicates of this experiment performed on the same batch ofoligonucleotides (FIG. 66A). This nature and distribution of errors hadlittle in common with that of the errors produced by Phusion polymerase(FIG. 66B and Table 56), which were distributed in the expectedstochastic pattern among replicate experiments. The number of errors inthe oligonucleotides synthesized with phosphoramidites was ˜60 timeshigher than in the equivalent products synthesized by Phusionpolymerase. These data, in toto, indicate that the vast majority oferrors in the former were generated during their synthesis rather thanduring the Safe-SeqS procedure.

Does Safe-SeqS preserve the ratio of mutant:normal sequences in theoriginal templates? To address this question, two 31-baseoligonucleotides of identical sequence with the exception of nt 15(50:50 C/G instead of T) were synthesized and mixed them at nominalmutant/normal fractions of 3.3% and 0.33%. Through Safe-SeqS analysis ofthe oligonucleotide mixtures, it was found that the ratios were 2.8% and0.27%, respectively. Thus, the UID assignment and amplificationprocedures used in Safe-SeqS do not greatly alter the proportion ofvariant sequences and thereby provide a reliable estimate of thatproportion when unknown. This is also supported by the reproducibilityof variant fractions when analyzed in independent Safe-SeqS experiments(FIG. 66A).

Analysis of DNA Sequences from Normal Human Cells

The exogenous UID strategy (FIG. 63) was then used to determine theprevalence of rare mutations in a small region of the CTNNB1 gene from˜100,000 normal human cells from three unrelated individuals. Throughcomparison with the number of UID-families obtained in the Safe-SeqSexperiments (Table 53B), it was calculated that the majority (78±9.8%)of the input fragments were converted into UID-families. There was anaverage of 68 members/UID-family, easily fulfilling the requiredredundancy for Safe-SeqS (FIG. 67). Conventional analysis of theIllumina sequencing data revealed an average of 118,488±11,357 mutationsamong the ˜560 Mb of sequence analyzed per sample, corresponding to anapparent mutation prevalence of 2.1±0.16×10⁻⁴ mutations/bp (Table 53B).Only an average of 99±78 super-mutants were observed in the Safe-SeqSanalysis. The vast majority (>99%) of super-mutants were single basesubstitutions and the calculated mutation rate was 9.0±3.1×10⁻⁶mutations/bp (Table 54B). Safe-SeqS thereby reduced the apparentfrequency of mutations in genomic DNA by at least 24-fold (FIG. 64).

One possible strategy to increase the specificity of Safe-SeqS is toperform the library amplification (and possibly the UID assignmentcycles) in multiple wells. This can be accomplished in as few as 2 or asmany as 384 wells using standard PCR plates, or scaled up to many morewells when using a microfluidic device (thousands to millions). Whenperformed this way, indexing sequences can be introduced into thetemplates that are unique to the wells in which the template isamplified. Rare mutations, thus, should give rise to two super-mutants(i.e., one from each strand), both with the same well index sequence.When performing Safe-SeqS with exogenous UIDs on the CTNNB1 templatesdescribed above and diluted into 10 wells (each well yielding templatesamplified with a different index sequence), the mutation rate wasfurther reduced from 9.0±3.1×10⁻⁶ to 3.7±1.2×10⁻⁶ super-mutants/bp.Thus, analyzing templates in multiple compartments—in a manner thatyields differentially encoded templates based on the compartment inwhich templates were amplified—may be an additional strategy to increasethe specificity of Safe-SeqS.

Analysis of DNA Sequences from Mitochondrial DNA

The identical strategy was applied to a short segment of mitochondrialDNA in ˜1,000 cells from each of seven unrelated individuals.Conventional analysis of the Illumina sequencing libraries produced withthe Safe-SeqS procedure (FIG. 63) revealed an average of 30,599±12,970mutations among the ˜150 Mb of sequence analyzed per sample,corresponding to an apparent mutation prevalence of 2.1±0.94×10⁻⁴mutations/bp (Table 53C). Only 135±61 super-mutants were observed in theSafe-SeqS analysis. As with the CTNNB1 gene, the vast majority ofmutations were single base substitutions, though occasional single basedeletions were also observed (Table 54C). The calculated mutation ratein the analyzed segment of mtDNA was 1.4±0.68×10⁻⁵ mutations/bp (Table53C). Thus, Safe-SeqS thereby reduced the apparent frequency ofmutations in genomic DNA by at least 15-fold.

Example 9: Detection of Genetic and Protein Biomarkers in Combinationwith Detection of Aneuploidy

Samples from a number of patients were tested for the presence ofgenetic biomarkers (NRAS, CTNNB1, PIK3CA, FBXW7, APC, EGFR, BRAF,CDKN2A, PTEN, FGFR2, HRAS, KRAS, AKT1, TP53, PPP2R1A, and/or GNAS) andprotein biomarkers (CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1,and/or MPO). The same samples were also tested for the presence ofaneuploidy using WALDO methods described herein. The results are shownin Table 59. As can be seen, the genetic and protein biomarker test cancomplement the aneuploidy test (e.g., some patients are negative for thegenetic and protein biomarker test while being positive for aneuploidy,and vice versa), such that the presence of cancer can be more accuratelyand completely detected using both tests.

Once a subject, is identified as having cancer by the genetic andprotein biomarker test, the aneuploidy test, or both, the subject canundergo further diagnostic testing and/or increased monitoring (e.g.,any of the variety of further diagnostic testing and/or increasedmonitoring methods described herein) and/or be administered atherapeutic intervention (e.g., any of the variety of therapeuticinterventions described herein).

REFERENCES

Certain of the following references are referred to herein. The contentsof each of the following references is incorporated herein by referencein its entirety.

-   AlHilli et al., Incidence and factors associated with synchronous    ovarian and endometrial cancer: a population-based case-control    study. Gynecologic oncology 125, 109-113 (2012).-   Allen P J, et al. (2017) Multi-institutional Validation Study of the    American Joint Commission on Cancer (8th Edition) Changes for T and    N Staging in Patients With Pancreatic Adenocarcinoma. Ann Surg    265(1):185-191.-   Allory Y, Beukers W, Sagrera A, Flandez M, Marques M, Marquez M, van    der Keur K A, Dyrskjot L, Lurkin I, Vermeij M, Carrato A, Lloreta J,    Lorente J A, Carrillo-de Santa Pau E, Masius R G, Kogevinas M,    Steyerberg E W, van Tilborg A A, Abas C, Orntoft T F, Zuiverloon T    C, Malats N, Zwarthoff E C, Real F X (2014) Telomerase reverse    transcriptase promoter mutations in bladder cancer: high frequency    across stages, detection in urine, and lack of association with    outcome. Eur Urol 65:360-366.-   Andre T, et al. (2009) Improved overall survival with oxaliplatin,    fluorouracil, and leucovorin as adjuvant treatment in stage II or    III colon cancer in the MOSAIC trial. J Clin Oncol 27(19):3109-3116.-   Anglesio et al., Cancer-Associated Mutations in Endometriosis    without Cancer. N Engl J Med 376, 1835-1848 (2017).-   Ansari D, et al. (2017) Relationship between tumour size and outcome    in pancreatic ductal adenocarcinoma. Br J Surg 104(5):600-607.-   Arnold et al., G. A. Stevens, M. Ezzati, J. Ferlay, J. J.    Miranda, I. Romieu, R. Dikshit, D. Forman, I. Soerjomataram, Global    burden of cancer attributable to high body-mass index in 2012: a    population-based study. The Lancet. Oncology 16, 36-46 (2015).-   Bahuva R, Walsh R M, Kapural L, & Stevens T (2013) Morphologic    abnormalities are poorly predictive of visceral pain in chronic    pancreatitis. Pancreas 42(1):6-10.-   Bansal N, Gupta A, Sankhwar S N, Mandi A A (2014) Low- and    high-grade bladder cancer appraisal via serum-based proteomics    approach. Clin Chim Acta 436:97-103.-   Barkan G A, Wojcik E M, Nayar R, Savic-Prince S, Quek M L, Kurtycz D    F, Rosenthal D L (2016) The Paris System for Reporting Urinary    Cytology: The Quest to Develop a Standardized Terminology. Adv Anat    Pathol 23:193-201.-   Bettegowda C, et al. (2014) Detection of circulating tumor DNA in    early- and late-stage human malignancies. Science translational    medicine 6(224):224ra224.-   Biankin A V, et al. (2012) Pancreatic cancer genomes reveal    aberrations in axon guidance pathway genes. Nature    491(7424):399-405.-   Bozic I, et al. (2013) Evolutionary dynamics of cancer in response    to targeted combination therapy. Elife 2:e00747.-   Buys et al., Effect of screening on ovarian cancer mortality: the    Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening    Randomized Controlled Trial. JAMA 305, 2295-2303 (2011).-   Cancer Genome Atlas Research Network (2014) Comprehensive molecular    characterization of urothelial bladder carcinoma. Nature    507:315-322.-   Capello M, et al. (2017) Sequential Validation of Blood-Based    Protein Biomarker Candidates for Early-Stage Pancreatic Cancer. J    Natl Cancer Inst 109(4).-   Chai H, Brown R E (2009) Field effect in cancer—an update. Ann Clin    Lab Sci 39:331-337-   Chari S T, et al. (2005) Probability of pancreatic cancer following    diabetes: a population-based study. Gastroenterology 129(2):504-511.-   Cheng L, Montironi R, Lopez-Beltran A (2017) TERT Promoter Mutations    Occur Frequently in Urothelial Papilloma and Papillary Urothelial    Neoplasm of Low Malignant Potential. Eur Urol 71:497-498.-   Cheung et al., High frequency of PIK3R1 and PIK3R2 mutations in    endometrial cancer elucidates a novel mechanism for regulation of    PTEN protein stability. Cancer Discov 1, 170-185 (2011).-   Clarke-Pearson D L (2009) Clinical practice. Screening for ovarian    cancer. N Engl J Med 361(2):170-177.-   Coombs et al., Therapy-Related Clonal Hematopoiesis in Patients with    Non-hematologic Cancers Is Common and Associated with Adverse    Clinical Outcomes. Cell Stem Cell, (2017).-   Cowan M L, Springer S, Nguyen D, Taheri D, Guner G, Mendoza    Rodriguez M A, Wang Y, Kinde I, Del Carmen Rodriguez Pena M,    VandenBussche C J, Olson M T, Cunha I, Fujita K, Ertoy D, Kinzler K,    Bivalacqua T, Papadopoulos N, Vogelstein B, Netto G J (2016)    Detection of TERT promoter mutations in primary adenocarcinoma of    the urinary bladder. Hum Pathol 53:8-13.-   Davis R, Jones J S, Barocas D A, Castle E P, Lang E K, Leveillee R    J, Messing E M, Miller S D, Peterson A C, Turk T M, Weitzel W,    American Urological Association (2012) Diagnosis, evaluation and    follow-up of asymptomatic microhematuria (AMH) in adults: AUA    guideline. J Urol 188:2473-2481.-   Dawson S J, et al. (2013) Analysis of circulating tumor DNA to    monitor metastatic breast cancer. N Engl J Med 368(13):1199-1209.-   Di Renzo M F, et al. (1995) Overexpression and amplification of the    met/HGF receptor gene during the progression of colorectal cancer.    Clin Cancer Res 1(2):147-154.-   Di Renzo M F, Poulsom R, Olivero M, Comoglio P M, & Lemoine N    R (1995) Expression of the Met/hepatocyte growth factor receptor in    human pancreatic cancer. Cancer Res 55(5):1129-1138.-   Dimashkieh H, Wolff D J, Smith T M, Houser P M, Nietert P J, Yang    J (2013) Evaluation of urovysion and cytology for bladder cancer    detection: a study of 1835 paired urine samples with clinical and    histologic correlation. Cancer Cytopathol 121:591-597.-   Dong T, Liu C C, Petricoin E F, & Tang L L (2014) Combining markers    with and without the limit of detection. Stat Med 33(8):1307-1320.-   Dressman D, Yan H, Traverso G, Kinzler K W, & Vogelstein B (2003)    Transforming single DNA molecules into fluorescent magnetic    particles for detection and enumeration of genetic variations. Proc    Natl Acad Sci USA 100(15):8817-8822.-   Dy G K, et al. (2009) Long-term survivors of metastatic colorectal    cancer treated with systemic chemotherapy alone: a North Central    Cancer Treatment Group review of 3811 patients, N0144. Clin    Colorectal Cancer 8(2):88-93.-   Eckert et al., Genomics of Ovarian Cancer Progression Reveals    Diverse Metastatic Trajectories Including Intraepithelial Metastasis    to the Fallopian Tube. Cancer Discov 6, 1342-1351 (2016).-   Egawa S, et al. (2004) Clinicopathological aspects of small    pancreatic cancer. Pancreas 28(3):235-240.-   Ellinger J, Muller S C, Dietrich D (2015) Epigenetic biomarkers in    the blood of patients with urological malignancies. Expert Rev Mol    Diagn 15:505-516.-   El-Tanani M K, et al. (2006) The regulation and role of osteopontin    in malignant transformation and cancer. Cytokine Growth Factor Rev    17(6):463-474.-   Erickson et al., Detection of somatic TP53 mutations in tampons of    patients with high-grade serous ovarian cancer. Obstetrics and    gynecology 124, 881-885 (2014).-   Fishman et al., The role of ultrasound evaluation in the detection    of early-stage epithelial ovarian cancer. Am J Obstet Gynecol 192,    1214-1221; discussion 1221-1212 (2005).-   Forbes et al., COSMIC: somatic cancer genetics at high-resolution.    Nucleic Acids Res 45, D777-D783 (2017).-   Forshew T, et al. (2012) Noninvasive identification and monitoring    of cancer mutations by targeted deep sequencing of plasma DNA.    Science translational medicine 4(136):136ra168.-   Fradet Y, Lockhard C (1997) Performance characteristics of a new    monoclonal antibody test for bladder cancer: ImmunoCyt trade mark.    Can J Urol 4:400-405.-   Frokjaer J B, Olesen S S, & Drewes A M (2013) Fibrosis, atrophy, and    ductal pathology in chronic pancreatitis are associated with    pancreatic function but independent of symptoms. Pancreas    42(7):1182-1187.-   Geldenhuys, Murray, Sensitivity and specificity of the Pap smear for    glandular lesions of the cervix and endometrium. Acta cytologica 51,    47-50 (2007).-   Genovese G, et al. (2014) Clonal hematopoiesis and blood-cancer risk    inferred from blood DNA sequence. N Engl J Med 371(26):2477-2487.-   Gerlinger et al., Intratumor heterogeneity and branched evolution    revealed by multiregion sequencing. N Engl J Med 366, 883-892    (2012).-   Gilbert et al., Assessment of symptomatic women for early diagnosis    of ovarian cancer: results from the prospective DOvE pilot project.    The Lancet. Oncology 13, 285-291 (2012).-   Goodison S, Chang M, Dai Y, Urquidi V, Rosser C J (2012) A    multi-analyte assay for the non-invasive detection of bladder    cancer. PLoS One 7:e47469.-   Gopalakrishna A, Fantony J J, Longo T A, Owusu R, Foo W C, Dash R,    Denton B T, Inman B A (2017) Anticipatory Positive Urine Tests for    Bladder Cancer. Ann Surg Oncol 24:1747-1753.-   Haber D A & Velculescu V E (2014) Blood-based analyses of cancer:    circulating tumor cells and circulating tumor DNA. Cancer Discov    4(6):650-661.-   Hajdinjak T (2008) UroVysion FISH test for detecting urothelial    cancers: meta-analysis of diagnostic accuracy and comparison with    urinary cytology testing. Urol Oncol 26:646-651-   Hamilton et al., Uterine papillary serous and clear cell carcinomas    predict for poorer survival compared to grade 3 endometrioid corpus    cancers. British journal of cancer 94, 642-646 (2006).-   Herbst R S, Heymach J V, & Lippman S M (2008) Lung cancer. N Engl J    Med 359(13):1367-1380.-   Horn S, Figl A, Rachakonda P S, Fischer C, Sucker A, Gast A, Kadel    S, Moll I, Nagore E, Hemminki K, Schadendorf D, Kumar R (2013) TERT    promoter mutations in familial and sporadic melanoma. Science    339:959-961.-   Howlader et al., SEER Cancer Statistics Review, 1975-2014, National    Cancer Institute. (2017).-   Howlader N, et al. (2016) SEER Cancer Statistics Review, 1975-2013,    National Cancer Institute. Bethesda, Md.,    http://seer.cancer.gov/csr/1975_2013/, based on November 2015 SEER    data submission, posted to the SEER web site, April 2016.-   Huang A C, et al. (2017) T-cell invigoration to tumour burden ratio    associated with anti-PD-1 response. Nature 545(7652):60-65.-   Huang F W, Hodis E, Xu M J, Kryukov G V, Chin L, Garraway L A (2013)    Highly recurrent TERT promoter mutations in human melanoma. Science    339:957-959.-   Hurst C D, Platt F M, Knowles M A (2014) Comprehensive mutation    analysis of the TERT promoter in bladder cancer and detection of    mutations in voided urine. Eur Urol 65:367-369.-   Ikematsu S, et al. (2000) Serum midkine levels are increased in    patients with various types of carcinomas. Br J Cancer    83(6):701-706.-   International Agency for Research on Cancer. (2016) WHO    Classification of Tumours of the Urinary System and Male Genital    Organs. World Health Organization; 4 edition-   Ishikawa O, et al. (1999) Minute carcinoma of the pancreas measuring    1 cm or less in diameter—collective review of Japanese case reports.    Hepatogastroenterology 46(25):8-15-   Jacobs et al., Ovarian cancer screening and mortality in the U K    Collaborative Trial of Ovarian Cancer Screening (UKCTOCS): a    randomised controlled trial. Lancet 387, 945-956 (2016).-   Jacobs et al., Sensitivity of transvaginal ultrasound screening for    endometrial cancer in postmenopausal women: a case-control study    within the UKCTOCS cohort. The Lancet. Oncology 12, 38-48 (2011).-   Jaiswal S, et al. (2014) Age-related clonal hematopoiesis associated    with adverse outcomes. N Engl J Med 371(26):2488-2498.-   Jones S, et al. (2008) Core signaling pathways in human pancreatic    cancers revealed by global genomic analyses. Science    321(5897):1801-1806.-   Jung et al., Intron retention is a widespread mechanism of    tumor-suppressor inactivation. Nat Genet 47, 1242-1248 (2015).-   Jung K W, et al. (2007) Clinicopathological aspects of 542 cases of    pancreatic cancer: a special emphasis on small pancreatic cancer. J    Korean Med Sci 22 Suppl:S79-85.-   K. N. Moore, A. N. Fader, Uterine papillary serous carcinoma. Clin    Obstet Gynecol 54, 278-291 (2011).-   Kalinich M, et al. (2017) An RNA-based signature enables high    specificity detection of circulating tumor cells in hepatocellular    carcinoma. Proc Natl Acad Sci USA 114(5):1123-1128.-   Kandoth et al., Integrated genomic characterization of endometrial    carcinoma. Nature 497, 67-73 (2013).-   Karst et al., Modeling high-grade serous ovarian carcinogenesis from    the fallopian tube. Proc Natl Acad Sci USA 108, 7547-7552 (2011).-   Kawauchi S, Sakai H, Ikemoto K, Eguchi S, Nakao M, Takihara H,    Shimabukuro T, Furuya T, Oga A, Matsuyama H, Takahashi M, Sasaki    K (2009) 9p21 Index as Estimated by Dual-Color Fluorescence in Situ    Hybridization is Useful to Predict Urothelial Carcinoma Recurrence    in Bladder Washing Cytology. Hum Pathol 40:1783-1789.-   Khadra M R, Pickard R S, Charlton M, Powell P H, Neal D E (2000) A    prospective analysis of 1,930 patients with hematuria to evaluate    current diagnostic practice. J Urol 163:524-527.-   Killela P J, Reitman Z J, Jiao Y, Bettegowda C, Agrawal N, Diaz L A,    Jr, Friedman A H, Friedman H, Gallia G L, Giovanella B C, Grollman A    P, He T C, He Y, Hruban R H, Jallo G I, Mandahl N, Meeker A K,    Mertens F, Netto G J, Rasheed B A, Riggins G J, Rosenquist T A,    Schiffman M, Shih I, Theodorescu D, Torbenson M S, Velculescu V E,    Wang T L, Wentzensen N, Wood L D, Zhang M, McLendon R E, Bigner D D,    Kinzler K W, Vogelstein B, Papadopoulos N, Yan H (2013) TERT    promoter mutations occur frequently in gliomas and a subset of    tumors derived from cells with low rates of self-renewal. Proc Natl    Acad Sci USA 110:6021-6026.-   Kim J E, et al. (2004) Clinical usefulness of carbohydrate antigen    19-9 as a screening test for pancreatic cancer in an asymptomatic    population. J Gastroenterol Hepatol 19(2):182-186.-   Kinde et al., FAST-SeqS: a simple and efficient method for the    detection of aneuploidy by massively parallel sequencing. PLoS One    7, e41162 (2012).-   Kinde I, et al. (2013) Evaluation of DNA from the Papanicolaou test    to detect ovarian and endometrial cancers. Science translational    medicine 5(167):167ra164.-   Kinde I, Munari E, Faraj S F, Hruban R H, Schoenberg M, Bivalacqua    T, Allaf M, Springer S, Wang Y, Diaz L A, Jr, Kinzler K W,    Vogelstein B, Papadopoulos N, Netto G J (2013) TERT promoter    mutations occur early in urothelial neoplasia and are biomarkers of    early disease and disease recurrence in urine. Cancer Res    73:7162-7167.-   Kinde I, Wu J, Papadopoulos N, Kinzler K W, & Vogelstein B (2011)    Detection and quantification of rare mutations with massively    parallel sequencing. Proc Natl Acad Sci USA 108(23):9530-9535.-   Kobayashi et al., A randomized study of screening for ovarian    cancer: a multicenter study in Japan. Int J Gynecol Cancer 18,    414-420 (2008).-   Koopmann J, et al. (2004) Evaluation of osteopontin as biomarker for    pancreatic adenocarcinoma. Cancer Epidemiol Biomarkers Prev    13(3):487-491.-   Krimmel et al., Ultra-deep sequencing detects ovarian cancer cells    in peritoneal fluid and reveals somatic TP53 mutations in    noncancerous tissues. Proc Natl Acad Sci USA 113, 6005-6010 (2016).-   Kruger S, Mess F, Bohle A, Feller A C (2003) Numerical aberrations    of chromosome 17 and the 9p21 locus are independent predictors of    tumor recurrence in non-invasive transitional cell carcinoma of the    urinary bladder. Int J Oncol 23:41-48.-   Kurman, Shih Ie, Molecular pathogenesis and extraovarian origin of    epithelial ovarian cancer—shifting the paradigm. Human pathology 42,    918-931 (2011).-   Kurman, Shih Ie, The Dualistic Model of Ovarian Carcinogenesis:    Revisited, Revised, and Expanded. Am J Pathol 186, 733-747 (2016).-   Kurman, Shih Ie, The origin and pathogenesis of epithelial ovarian    cancer: a proposed unifying theory. The American journal of surgical    pathology 34, 433-443 (2010).-   Le Calvez-Kelm F, et al. (2016) KRAS mutations in blood circulating    cell-free DNA: a pancreatic cancer case-control. Oncotarget    7(48):78827-78840.-   Lee et al., A candidate precursor to serous carcinoma that    originates in the distal fallopian tube. The Journal of pathology    211, 26-35 (2007).-   Lennon A M & Goggins M (2010) Diagnostic and Therapeutic Response    Markers. Pancreatic Cancer, (Springer New York, N.Y., N.Y.), pp    675-701.-   Lennon A M, et al. (2014) The Early Detection of Pancreatic Cancer:    What Will It Take to Diagnose and Treat Curable Pancreatic    Neoplasia? Cancer Res 74(13):3381-3389.-   Levina V V, et al. (2009) Biological significance of prolactin in    gynecologic cancers. Cancer Res 69(12):5226-5233.-   Lin H H, Ke H L, Huang S P, Wu W J, Chen Y K, Chang L L (2010)    Increase sensitivity in detecting superficial, low grade bladder    cancer by combination analysis of hypermethylation of E-cadherin,    p16, p14, RASSF1A genes in urine. Urol Oncol 28:597-602.-   Liotta L A & Petricoin E F, 3rd (2003) The promise of proteomics.    Clin Adv Hematol Oncol 1(8):460-462.-   Locker G Y, et al. (2006) ASCO 2006 update of recommendations for    the use of tumor markers in gastrointestinal cancer. J Clin Oncol    24(33):5313-5327.-   Lotan Y, Roehrborn C G (2003) Sensitivity and specificity of    commonly available bladder tumor markers versus cytology: results of    a comprehensive literature review and meta-analyses. Urology    61:109-18; discussion 118.-   Meden, Fattahi-Meibodi, C A 125 in benign gynecological conditions.    Int J Biol Markers 13, 231-237 (1998).-   Menon et al., Risk Algorithm Using Serial Biomarker Measurements    Doubles the Number of Screen-Detected Cancers Compared With a    Single-Threshold Rule in the United Kingdom Collaborative Trial of    Ovarian Cancer Screening. J Clin Oncol 33, 2062-2071 (2015).-   Mishriki S F, Nabi G, Cohen N P (2008) Diagnosis of urologic    malignancies in patients with asymptomatic dipstick hematuria:    prospective study with 13 years' follow-up. Urology 71:13-16.-   Mo L, Zheng X, Huang H Y, Shapiro E, Lepor H, Cordon-Cardo C, Sun T    T, Wu X R (2007) Hyperactivation of Ha-ras oncogene, but not    Ink4a/Arf deficiency, triggers bladder tumorigenesis. J Clin Invest    117:314-325.-   Moertel C G, et al. (1995) Fluorouracil plus levamisole as effective    adjuvant therapy after resection of stage III colon carcinoma: a    final report. Ann Intern Med 122(5):321-326.-   Moonen P M, Merkx G F, Peelen P, Karthaus H F, Smeets D F, Witjes J    A (2007). UroVysion compared with cytology and quantitative cytology    in the surveillance of non-muscle-invasive bladder cancer. Eur Urol    51:1275-80; discussion 1280-   Moore et al., The use of multiple novel tumor biomarkers for the    detection of ovarian carcinoma in patients with a pelvic mass.    Gynecologic oncology 108, 402-408 (2008).-   Moyer, Screening for ovarian cancer: U.S. Preventive Services Task    Force reaffirmation recommendation statement. Annals of internal    medicine 157, 900-904 (2012).-   N. Cancer Genome Atlas Research, Integrated genomic analyses of    ovarian carcinoma. Nature 474, 609-615 (2011).-   Nair et al., Genomic Analysis of Uterine Lavage Fluid Detects Early    Endometrial Cancers and Reveals a Prevalent Landscape of Driver    Mutations in Women without Histopathologic Evidence of Cancer: A    Prospective Cross-Sectional Study. PLoS Med 13, e1002206 (2016).-   Nazli O, Bozdag A D, Tansug T, Kir R, & Kaymak E (2000) The    diagnostic importance of CEA and C A 19-9 for the early diagnosis of    pancreatic carcinoma. Hepatogastroenterology 47(36):1750-1752.-   Netto G J (2011) Molecular biomarkers in urothelial carcinoma of the    bladder: are we there yet?. Nat Rev Urol 9:41-51.-   Netto G J (2013) Clinical applications of recent molecular advances    in urologic malignancies: no longer chasing a “mirage”?. Adv Anat    Pathol 20:175-203.-   Netto G J, Epstein J I (2010) Theranostic and prognostic biomarkers:    genomic applications in urological malignancies. Pathology    42:384-394.-   Netto G J, Tafe L J (2016) Emerging Bladder Cancer Biomarkers and    Targets of Therapy. Urol Clin North Am 43:63-76.-   Ng et al., Significance of endometrial cells in the detection of    endometrial carcinoma and its precursors. Acta cytologica 18,    356-361 (1974).-   Nguyen D, Taheri D, Springer S, Cowan M, Guner G, Mendoza Rodriguez    M A, Wang Y, Kinde I, VandenBussche C J, Olson M T, Ricardo B F,    Cunha I, Fujita K, Ertoy D, Kinzler K W, Bivalacqua T J,    Papadopoulos N, Vogelstein B, Netto G J (2016) High prevalence of    TERT promoter mutations in micropapillary urothelial carcinoma.    Virchows Arch 469:427-434.-   O'Brien D P, et al. (2015) Serum CA19-9 is significantly upregulated    up to 2 years before diagnosis with pancreatic cancer: implications    for early disease detection. Clin Cancer Res 21(3):622-631.-   Rago et al., Serial assessment of human tumor burdens in mice by the    analysis of circulating DNA. Cancer research 67, 9364-9370 (2007).-   Rahib L, et al. (2014) Projecting cancer incidence and deaths to    2030: the unexpected burden of thyroid, liver, and pancreas cancers    in the United States. Cancer Res 74(11):2913-2921.-   Ralla B, Stephan C, Meller S, Dietrich D, Kristiansen G, Jung    K (2014) Nucleic acid-based biomarkers in body fluids of patients    with urologic malignancies. Crit Rev Clin Lab Sci 51:200-231.-   Rodriguez Pena M D C, Tregnago A C, Eich M L, Springer S, Wang Y,    Taheri D, Ertoy D, Fujita K, Bezerra S M, Cunha I W, Raspollini M R,    Yu L, Bivalacqua T J, Papadopoulos N, Kinzler K W, Vogelstein B,    Netto G J (2017) Spectrum of genetic mutations in de novo PUNLMP of    the urinary bladder. Virchows Arch.-   Ryan D P, Hong T S, & Bardeesy N (2014) Pancreatic adenocarcinoma. N    Engl J Med 371(22):2140-2141.-   Sarkis A S, Bajorin D F, Reuter V E, Herr H W, Netto G, Zhang Z F,    Schultz P K, Cordon-Cardo C, Scher H I (1995) Prognostic value of    p53 nuclear overexpression in patients with invasive bladder cancer    treated with neoadjuvant MVAC. J Clin Oncol 13:1384-1390.-   Sarkis A S, Dalbagni G, Cordon-Cardo C, Melamed J, Zhang Z F,    Sheinfeld J, Fair W R, Herr H W, Reuter V E (1994) Association of    P53 nuclear overexpression and tumor progression in carcinoma in    situ of the bladder. J Urol 152:388-392.-   Sarkis A S, Dalbagni G, Cordon-Cardo C, Zhang Z F, Sheinfeld J, Fair    W R, Herr H W, Reuter V E (1993) Nuclear overexpression of p53    protein in transitional cell bladder carcinoma: a marker for disease    progression. J Natl Cancer Inst 85:53-59.-   Sarosdy M F, Kahn P R, Ziffer M D, Love W R, Barkin J, Abara E O,    Jansz K, Bridge J A, Johansson S L, Persons D L, Gibson J S (2006)    Use of a multitarget fluorescence in situ hybridization assay to    diagnose bladder cancer in patients with hematuria. J Urol    176:44-47.-   Schnatz et al., Clinical significance of atypical glandular cells on    cervical cytology. Obstetrics and gynecology 107, 701-708 (2006).-   Scott G A, Laughlin T S, Rothberg P G (2014) Mutations of the TERT    promoter are common in basal cell carcinoma and squamous cell    carcinoma. Mod Pathol 27:516-523.-   Semrad T J, Fahrni A R, Gong I Y, & Khatri V P (2015) Integrating    Chemotherapy into the Management of Oligometastatic Colorectal    Cancer: Evidence-Based Approach Using Clinical Trial Findings. Ann    Surg Oncol 22 Suppl 3:S855-862.-   Serizawa R R, Ralfkiaer U, Steven K, Lam G W, Schmiedel S, Schuz J,    Hansen A B, Horn T, Guldberg P (2010) Integrated genetic and    epigenetic analysis of bladder cancer reveals an additive diagnostic    value of FGFR3 mutations and hypermethylation events. Int J Cancer.-   Sharma et al., Risk of epithelial ovarian cancer in asymptomatic    women with ultrasound-detected ovarian masses: a prospective cohort    study within the U K collaborative trial of ovarian cancer screening    (UKCTOCS). Ultrasound Obstet Gynecol 40, 338-344 (2012).-   Siegel R L, Miller K D, Jemal A (2017) Cancer Statistics, 2017. C A    Cancer J Clin 67:7-30-   Siravegna et al., Integrating liquid biopsies into the management of    cancer. Nat Rev Clin Oncol 14, 531-548 (2017).-   Skacel M, Fahmy M, Brainard J A, Pettay J D, Biscotti C V, Liou L S,    Procop G W, Jones J S, Ulchaker J, Zippe C D, Tubbs R R (2003)    Multitarget fluorescence in situ hybridization assay detects    transitional cell carcinoma in the majority of patients with bladder    cancer and atypical or negative urine cytology. J Urol    169:2101-2105.-   Song et al., Prognostic factors in women with synchronous    endometrial and ovarian cancers. Int J Gynecol Cancer 24, 520-527    (2014).-   Springer S, et al. (2015) A Combination of Molecular Markers and    Clinical Features Improve the Classification of Pancreatic Cysts.    Gastroenterology 149(6):1501-1510.-   Steensma et al., Clonal hematopoiesis of indeterminate potential and    its distinction from myelodysplastic syndromes. Blood 126, 9-16    (2015).-   Stern J L, Theodorescu D, Vogelstein B, Papadopoulos N, Cech T    R (2015) Mutation of the TERT promoter, switch to active chromatin,    and monoallelic TERT expression in multiple cancers. Genes Dev    29:2219-2224.-   Takahashi T, Habuchi T, Kakehi Y, Mitsumori K, Akao T, Terachi T,    Yoshida 0 (1998) Clonal and chronological genetic analysis of    multifocal cancers of the bladder and upper urinary tract. Cancer    Res 58:5835-5841.-   Tao, Direct intrauterine sampling: the IUMC Endometrial Sampler.    Diagnostic cytopathology 17, 153-159 (1997).-   Thomas D S, et al. (2015) Evaluation of serum CEA, CYFRA21-1 and    CA125 for the early detection of colorectal cancer using    longitudinal preclinical samples. Br J Cancer 113(2):268-274.-   Thorpe J D, et al. (2007) Effects of blood collection conditions on    ovarian cancer serum markers. PLoS One 2(12):e1281.-   Tsuchiya R, et al. (1986) Collective review of small carcinomas of    the pancreas. Ann Surg 203(1):77-81.-   Uhlen M, et al. (2015) Proteomics. Tissue-based map of the human    proteome. Science 347(6220):1260419.-   Vogelstein B & Kinzler K W (1999) Digital PCR. Proc Natl Acad Sci    USA 96(16):9236-9241-   Vogelstein B, Papadopoulos N, Velculescu V E, Zhou S, Diaz L A, Jr,    Kinzler K W (2013) Cancer genome landscapes. Science 339:1546-1558.-   Waddell N, et al. (2015) Whole genomes redefine the mutational    landscape of pancreatic cancer. Nature 518(7540):495-501.-   Walsh et al., Coexisting ovarian malignancy in young women with    endometrial cancer. Obstetrics and gynecology 106, 693-699 (2005).-   Wang K, Liu T, Ge N, Liu L, Yuan X, Liu J, Kong F, Wang C, Ren H,    Yan K, Hu S, Xu Z, Bjorkholm M, Fan Y, Zhao S, Liu C, Xu D (2014)    TERT promoter mutations are associated with distant metastases in    upper tract urothelial carcinomas and serve as urinary biomarkers    detected by a sensitive castPCR. Oncotarget 5:12428-12439.-   Wang Y, et al. (2015) Detection of somatic mutations and HPV in the    saliva and plasma of patients with head and neck squamous cell    carcinomas. Science translational medicine 7(293):293ra104.-   Wang Y, et al. (2015) Detection of tumor-derived DNA in    cerebrospinal fluid of patients with primary tumors of the brain and    spinal cord. Proc Natl Acad Sci USA 112(31):9704-9709.-   Wang Y, et al. (2016) Diagnostic potential of tumor DNA from ovarian    cyst fluid. Elife 5.-   Wein A J, Kavoussi L R, Novick A C, Partin A W, Peters C A (2012)    Campbell-Walsh Urology. Saunders, Philadelphia.-   Wilcox C M, et al. (2015) Chronic pancreatitis pain pattern and    severity are independent of abdominal imaging findings. Clin    Gastroenterol Hepatol 13(3):552-560; quiz e528-559.-   Wu et al., Endometrial brush biopsy (Tao brush). Histologic    diagnosis of 200 cases with complementary cytology: an accurate    sampling technique for the detection of endometrial abnormalities.    American journal of clinical pathology 114, 412-418 (2000).-   Wu X R (2005) Urothelial tumorigenesis: a tale of divergent    pathways. Nat Rev Cancer 5:713-725.-   Xie M, et al. (2014) Age-related mutations associated with clonal    hematopoietic expansion and malignancies. Nat Med 20(12):1472-1478.-   Yafi F A, Brimo F, Steinberg J, Aprikian A G, Tanguay S, Kassouf    W (2015) Prospective analysis of sensitivity and specificity of    urinary cytology and other urinary biomarkers for bladder cancer.    Urol Oncol 33:66.e25-66.e31.-   Young et al., Clonal haematopoiesis harbouring AML-associated    mutations is ubiquitous in healthy adults. Nat Commun 7, 12484    (2016).-   Zaino et al., Simultaneously detected endometrial and ovarian    carcinomas—a prospective clinicopathologic study of 74 cases: a    gynecologic oncology group study. Gynecologic oncology 83, 355-362    (2001).-   Zhai et al., High-grade serous carcinomas arise in the mouse oviduct    via defects linked to the human disease. The Journal of pathology    243, 16-25 (2017).-   Zhang M L, Rosenthal D L, VandenBussche C J (2016) The    cytomorphological features of low-grade urothelial neoplasms vary by    specimen type. Cancer Cytopathol 124:552-564-   Zhao et al., Histologic follow-up results in 662 patients with Pap    test findings of atypical glandular cells: results from a large    academic womens hospital laboratory employing sensitive screening    methods. Gynecologic oncology 114, 383-389 (2009).-   Zhou W, et al. (1998) Identifying markers for pancreatic cancer by    gene expression analysis. Cancer Epidemiol Biomarkers Prev    7(2):109-112.

What is claimed is:
 1. A method of evaluating a subject for the presenceof any of a plurality of cancers in a subject, comprising: detecting ina biological sample obtained from the subject the presence of one ormore driver gene mutations in one or more driver genes, wherein eachdriver gene is associated with the presence of a cancer in the pluralityof cancers; thereby evaluating the subject for the presence of any ofthe plurality of cancers, wherein the number of driver gene mutationsdetected is sufficient such that the sensitivity of detection of thecancer in the plurality of cancers with which each driver gene isassociated is not substantially increased by the detection of one ormore additional driver gene mutations.
 2. The method of claim 1, whereindetecting the one or more driver gene mutations comprises sequencing oneor more regions of interest or amplicons comprising the driver genemutation.
 3. The method of claim 2, wherein the number of regions ofinterest or amplicons sequenced is sufficient such that the sensitivityof detection of the cancer in the plurality of cancers with which eachdriver gene is associated with is not substantially increased bysequencing one or more additional regions of interest or amplicons. 4.The method of claim 1, wherein the plurality of cancers comprises 4, 5,6, 7 or 8 cancers.
 5. The method of claim 1, wherein the plurality ofcancers is chosen from two or more of liver cancer, ovarian cancer,esophageal cancer, stomach cancer, pancreatic cancer, colorectal cancer,lung cancer, breast cancer, or prostate cancer.
 6. The method of claim3, wherein at least 30 and not more than 400 regions of interest oramplicons from the driver genes are sequenced.
 7. The method of claim 3,wherein each region of interest or amplicon comprises 6-800 bp.
 8. Themethod of claim 3, wherein the number of regions of interest oramplicons sequenced is at least 500 bp and no more than 3000 bp.
 9. Themethod of claim 1, wherein at least 6 bp and no more than 300 bp in eachdriver gene is sequenced.
 10. The method of claim 1, wherein: (i) thesubject has not yet been determined to have a cancer, (ii) the subjecthas not yet been determined to harbor a cancer cell, or (iii) thesubject does not exhibit, or has not exhibited, a symptom associatedwith a cancer.
 11. The method of claim 1, wherein the one or more drivergenes are chosen from a gene disclosed in Table 60 or
 61. 12. The methodof claim 1, wherein the one or more driver genes comprises 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, or 16 genes chosen from Tables 60and
 61. 13. The method of claim 12, wherein the one or more driver genescomprise one or more of KRAS, PIK3CA, HRAS, CDKN2A, TP53, AKT1, CTNNB1,APC, EGFR, GNAS, PPP2R1A, BRAF, FBXM7, PTEN, or FGFR2.
 14. The method ofclaim 1, wherein the cancer of any of the plurality of cancers is chosenfrom: liver cancer, ovarian cancer, esophageal cancer, stomach cancer,pancreatic cancer, colorectal cancer, lung cancer, breast cancer, orprostate cancer.
 15. The method of claim 1, further comprising: a)detecting the level of each of one or more protein biomarkers in thebiological sample, wherein the level of each protein biomarker isassociated with the presence of a cancer of the plurality of cancers;and b) identifying the presence of a cancer of the plurality of cancersin the subject when the presence of one or more protein biomarkers isdetected.
 16. The method of claim 15, further comprising comparing thedetected levels of each protein biomarker to a reference level for theprotein biomarker.
 17. The method of claim 1, wherein the biologicalsample comprises one or more of: (i) a tumor sample, a circulating tumorDNA sample, a solid tumor biopsy sample, or a fixed tumor sample, (ii) ablood sample, (iii) an apheresis sample, (iv) a cell-free DNA sample, or(v) a protein sample.
 18. The method of claim 15, wherein the proteinbiomarker comprises one or more of: CA19-9, CEA, HGF, OPN, CA125,prolactin, TIMP-1, or MPO.
 19. The method of claim 1, wherein detectingthe presence of one or more driver gene mutation comprises: a. assigninga unique identifier (UID) to each of a plurality of template moleculespresent in the sample; b. amplifying each uniquely tagged templatemolecule to create UID-families; and c. redundantly sequencing theamplification products.
 20. A method of evaluating a subject for thepresence of any of a plurality of cancers in a subject, comprising: (a)detecting in a biological sample obtained from the subject the presenceof one or more driver gene mutations in one or more driver genes,wherein the one or more driver genes comprise one or more of KRAS,PIK3CA, HRAS, CDKN2A, TP53, AKT1, CTNNB1, APC, EGFR, GNAS, PPP2R1A,BRAF, FBXM7, PTEN, or FGFR2, and wherein each driver gene is associatedwith the presence of a cancer in the plurality of cancers; and (b)detecting the level of one or more protein biomarkers in a biologicalsample, wherein the one or more protein biomarkers comprise one or moreof CA19-9, CEA, HGF, OPN, CA125, prolactin, TIMP-1, or MPO, and whereinthe level of each protein biomarker is associated with the presence of acancer of the plurality of cancers, thereby evaluating the subject forthe presence of any of the plurality of cancers, wherein the presence ofa cancer of the plurality of cancers is identified when the presence ofone or more driver gene mutations and the level of one or more of theprotein biomarkers is detected.